Uploaded image for project: 'WildFly WIP'
  1. WildFly WIP
  2. WFWIP-206

scale down isn't successful when there are in-doubt transaction on pod

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Blocker
    • OpenShift
    • Hide

      it can be reproduced using test:

      git clone git@gitlab.mw.lab.eng.bos.redhat.com:msimka/openshift-eap-tests.git
      cd openshift-eap-tests.git
      git checkout EAP7-1192_scaledown
      # create file test.properties
      mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest -Dcheckstyle.skip -Dconsole-log-level=DEBUG
      

      test.properties

      xtf.openshift.url=https://master.all-in-one-msimka-002.dynamic.xpaas:8443
      xtf.openshift.namespace=wip-namespace
      xtf.bm.namespace=wip-builds-namespace
      
      xtf.eap.72.image=docker-registry.engineering.redhat.com/ochaloup/wildfly18-snapshot:190909-d4ddf04cc2-wfcore-10.0.0.Beta7-SNAPSHOT
      xtf.eap.72.properties.eap.imagestream.name=jboss-eap73-openshift
      
      xtf.eap.72.version=7.2.0.GA
      xtf.eap.properties.location=/opt/eap
      xtf.eap.72.templates.repo=git://github.com/jboss-container-images/jboss-eap-7-openshift-image.git,git://github.com/jboss-container-images/redhat-sso-7-openshift-image.git
      xtf.eap.72.templates.branch=eap72,v7.3.0.GA
      
      xtf.maven.proxy.url=http://maven.all-in-one-043.dynamic.xpaas/nexus/content/groups/public/
      
      # this might be needed if oc login command fails, see test suite logs
      #xtf.openshift.binary.path=/home/msimka/.minishift/cache/oc/v3.11.0/linux/oc
      xtf.operator.image=docker-registry.engineering.redhat.com/jbossqe-eap/wildfly-operator:latest
      xtf.operator.resources.context=
      
      # specify correct path oc binary
      xtf.openshift.binary.path=<oc_binary_path>
      
      Show
      it can be reproduced using test: git clone git@gitlab.mw.lab.eng.bos.redhat.com:msimka/openshift-eap-tests.git cd openshift-eap-tests.git git checkout EAP7-1192_scaledown # create file test.properties mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest -Dcheckstyle.skip -Dconsole-log-level=DEBUG test.properties xtf.openshift.url=https: //master.all-in-one-msimka-002.dynamic.xpaas:8443 xtf.openshift.namespace=wip-namespace xtf.bm.namespace=wip-builds-namespace xtf.eap.72.image=docker-registry.engineering.redhat.com/ochaloup/wildfly18-snapshot:190909-d4ddf04cc2-wfcore-10.0.0.Beta7-SNAPSHOT xtf.eap.72.properties.eap.imagestream.name=jboss-eap73-openshift xtf.eap.72.version=7.2.0.GA xtf.eap.properties.location=/opt/eap xtf.eap.72.templates.repo=git: //github.com/jboss-container-images/jboss-eap-7-openshift-image.git,git://github.com/jboss-container-images/redhat-sso-7-openshift-image.git xtf.eap.72.templates.branch=eap72,v7.3.0.GA xtf.maven.proxy.url=http: //maven.all-in-one-043.dynamic.xpaas/nexus/content/groups/ public / # this might be needed if oc login command fails, see test suite logs #xtf.openshift.binary.path=/home/msimka/.minishift/cache/oc/v3.11.0/linux/oc xtf. operator .image=docker-registry.engineering.redhat.com/jbossqe-eap/wildfly- operator :latest xtf. operator .resources.context= # specify correct path oc binary xtf.openshift.binary.path=<oc_binary_path>

    Description

      While testing tx recovery in OpenShift I see that scale down of pod that has transaction in-doubt on it isn't successful

      Scenario:

      ejb client (app tx-client, pod tx-client-0):

      • EJB business method
        • lookup remote EJB
        • enlist XA resource 1 to transaction
        • enlist XA resource 2 to transaction
        • call remote EJB

      ejb server (app tx-server, pod tx-server-0):

      • EJB business method
        • enlist XA resource 1 to transaction
        • enlist XA resource 2 to transaction

      ejb server XA resource 2 fails with XAException(XAException.XAER_RMFAIL)

      Then the test calls scale down (size from 1 to 0) on tx-server pod. But scale down never completes.

      Log from operator:

      {"level":"info","ts":1568905676.6303623,"logger":"controller_wildflyserver","msg":"Transaction recovery scaledown processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod Name":"tx-server-0","IP Address":"172.17.0.10"}
      {"level":"info","ts":1568905676.7313502,"logger":"controller_wildflyserver","msg":"Enabling recovery listener for processing scaledown at tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
      {"level":"info","ts":1568905679.4325309,"logger":"controller_wildflyserver","msg":"Query to find the transaction recovery port to force scan at pod tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
      {"level":"info","ts":1568905686.7914035,"logger":"controller_wildflyserver","msg":"Executing recovery scan at tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod IP":"172.17.0.10","Recovery port":4712}
      {"level":"info","ts":1568905702.0583296,"logger":"controller_wildflyserver","msg":"In-doubt transactions in object store","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod Name":"tx-server-0","Message":"Recovery scan to be invoked as the transaction log storage is not empty for pod scaling down pod tx-server-0, transaction list: map[0:ffffac11000a:991c183:5d8399a1:13:map[age-in-seconds:23 id:0:ffffac11000a:991c183:5d8399a1:13 jmx-name:<nil> participants:map[java:/MockXAResource:map[eis-product-name:MockXAResource Test eis-product-version:0.1.Mock jmx-name:<nil> jndi-name:java:/MockXAResource status:PREPARED type:/StateManager/AbstractRecord/XAResourceRecord]] type:StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/SubordinateAtomicAction/JCA]]"}
      {"level":"info","ts":1568905711.1034026,"logger":"controller_wildflyserver","msg":"Reconciling WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
      {"level":"info","ts":1568905711.103548,"logger":"controller_wildflyserver","msg":"Transaction recovery scaledown processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod Name":"tx-server-0","IP Address":"172.17.0.10"}
      {"level":"info","ts":1568905711.109706,"logger":"controller_wildflyserver","msg":"Executing recovery scan at tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod IP":"172.17.0.10","Recovery port":4712}
      {"level":"error","ts":1568905711.2608829,"logger":"controller_wildflyserver","msg":"Failures during scaling down recovery processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Desired replica size":0,"Number of pods to be removed":1,"error":"Found 1 errors:\n [[Failed to run transaction recovery scan for scaling down pod tx-server-0. Please, verify the pod log file. Error: Error to get response for command SCAN sending to 172.17.0.10:4712, error: EOF]],","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/wildfly/wildfly-operator/pkg/controller/wildflyserver.(*ReconcileWildFlyServer).Reconcile\n\t/go/src/github.com/wildfly/wildfly-operator/pkg/controller/wildflyserver/wildflyserver_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
      {"level":"info","ts":1568905711.2609518,"logger":"controller_wildflyserver","msg":"Scaling down statefulset by verification if pods are clean by recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      {"level":"info","ts":1568905711.2609615,"logger":"controller_wildflyserver","msg":"Statefulset was not fully scaled to the desired replica size 0 while StatefulSet is to be at size 1. Some pods were not cleaned by recovery. Verify status of the WildflyServer tx-server","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      {"level":"info","ts":1568905711.2795022,"logger":"controller_wildflyserver","msg":"Updating StatefulSet to be up to date with the WildFlyServer Spec","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      {"level":"info","ts":1568905711.2795491,"logger":"controller_wildflyserver","msg":"Reconciling WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
      {"level":"info","ts":1568905711.2796504,"logger":"controller_wildflyserver","msg":"Transaction recovery scaledown processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod Name":"tx-server-0","IP Address":"172.17.0.10"}
      {"level":"info","ts":1568905711.2937052,"logger":"controller_wildflyserver","msg":"Executing recovery scan at tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod IP":"172.17.0.10","Recovery port":4712}
      {"level":"error","ts":1568905711.294249,"logger":"controller_wildflyserver","msg":"Failures during scaling down recovery processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Desired replica size":0,"Number of pods to be removed":1,"error":"Found 1 errors:\n [[Failed to run transaction recovery scan for scaling down pod tx-server-0. Please, verify the pod log file. Error: Cannot process TCP connection to 172.17.0.10:4712, error: dial tcp 172.17.0.10:4712: connect: connection refused]],","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/wildfly/wildfly-operator/pkg/controller/wildflyserver.(*ReconcileWildFlyServer).Reconcile\n\t/go/src/github.com/wildfly/wildfly-operator/pkg/controller/wildflyserver/wildflyserver_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
      {"level":"info","ts":1568905711.294342,"logger":"controller_wildflyserver","msg":"Scaling down statefulset by verification if pods are clean by recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      {"level":"info","ts":1568905711.294417,"logger":"controller_wildflyserver","msg":"Statefulset was not fully scaled to the desired replica size 0 while StatefulSet is to be at size 1. Some pods were not cleaned by recovery. Verify status of the WildflyServer tx-server","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      {"level":"error","ts":1568905711.311673,"logger":"controller_wildflyserver","msg":"Failed to Update StatefulSet.","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server","error":"Operation cannot be fulfilled on statefulsets.apps \"tx-server\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/wildfly/wildfly-operator/pkg/controller/wildflyserver.(*ReconcileWildFlyServer).checkStatefulSet\n\t/go/src/github.com/wildfly/wildfly-operator/pkg/controller/wildflyserver/wildflyserver_controller.go:470\ngithub.com/wildfly/wildfly-operator/pkg/controller/wildflyserver.(*ReconcileWildFlyServer).Reconcile\n\t/go/src/github.com/wildfly/wildfly-operator/pkg/controller/wildflyserver/wildflyserver_controller.go:247\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
      {"level":"error","ts":1568905711.311745,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"wildflyserver-controller","request":"msimka-namespace/tx-server","error":"Operation cannot be fulfilled on statefulsets.apps \"tx-server\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
      {"level":"info","ts":1568905712.3137681,"logger":"controller_wildflyserver","msg":"Reconciling WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-server"}
      {"level":"info","ts":1568905712.3139439,"logger":"controller_wildflyserver","msg":"Transaction recovery scaledown processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod Name":"tx-server-0","IP Address":"172.17.0.10"}
      {"level":"info","ts":1568905712.3253288,"logger":"controller_wildflyserver","msg":"Executing recovery scan at tx-server-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Pod IP":"172.17.0.10","Recovery port":4712}
      {"level":"error","ts":1568905712.3255754,"logger":"controller_wildflyserver","msg":"Failures during scaling down recovery processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-server","Desired replica size":0,"Number of pods to be removed":1,"error":"Found 1 errors:\n [[Failed to run transaction recovery scan for scaling down pod tx-server-0. Please, verify the pod log file. Error: Cannot process TCP connection to 172.17.0.10:4712, error: dial tcp 172.17.0.10:4712: connect: connection refused]],","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/wildfly/wildfly-operator/pkg/controller/wildflyserver.(*ReconcileWildFlyServer).Reconcile\n\t/go/src/github.com/wildfly/wildfly-operator/pkg/controller/wildflyserver/wildflyserver_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.12/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
      {"level":"info","ts":1568905712.3256311,"logger":"controller_wildflyserver","msg":"Scaling down statefulset by verification if pods are clean by recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      {"level":"info","ts":1568905712.3256419,"logger":"controller_wildflyserver","msg":"Statefulset was not fully scaled to the desired replica size 0 while StatefulSet is to be at size 1. Some pods were not cleaned by recovery. Verify status of the WildflyServer tx-server","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-server"}
      

      Attachments

        Issue Links

          Activity

            People

              ochaloup@redhat.com Ondrej Chaloupka (Inactive)
              msimka@redhat.com Martin Simka
              Martin Simka Martin Simka
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: