Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-17956

scale down of client pod that has transaction in-doubt on it isn't successful when there is server pod that is part of transaction which isn't reachable

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • OpenShift, Transactions
    • As explained in Ondra's comment, this issue is not a bug but a limitation of the current implementation.
    • Hide

      it can be reproduced using test:

      git clone git@gitlab.mw.lab.eng.bos.redhat.com:msimka/openshift-eap-tests.git
      cd openshift-eap-tests.git
      git checkout EAP7-1192_scaledown
      # create file test.properties
      
      mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest#testTxStatelessServerSecondPrepareJvmHaltScaleDownClient -Dconsole-log-level=DEBUG #test is disabled, enable it first
      

      test.properties

      xtf.openshift.url=<os4/crc cluster url>
      xtf.openshift.namespace=wip-namespace
      xtf.bm.namespace=wip-builds-namespace
      
      xtf.eap.72.image=docker-registry.upshift.redhat.com/kwills/eap-cd-openshift-rhel8:18.0-EAP7-1216
      xtf.eap.72.properties.eap.imagestream.name=jboss-eap73-openshift
      
      xtf.eap.72.version=7.2.0.GA
      xtf.eap.properties.location=/opt/eap
      xtf.eap.72.templates.repo=git://github.com/jboss-container-images/jboss-eap-7-openshift-image.git,git://github.com/jboss-container-images/redhat-sso-7-openshift-image.git
      xtf.eap.72.templates.branch=eap72,v7.3.0.GA
      
      xtf.operator.eap.image=registry-proxy.engineering.redhat.com/rh-osbs/jboss-eap-7-tech-preview-eap-operator:jb-eap-7.3-operator-rhel8-containers-candidate-48963-20191008130923
      
      # specify correct path oc binary
      xtf.openshift.binary.path=<oc_binary_path>
      xtf.openshift.token=<user token>
      
      Show
      it can be reproduced using test: git clone git@gitlab.mw.lab.eng.bos.redhat.com:msimka/openshift-eap-tests.git cd openshift-eap-tests.git git checkout EAP7-1192_scaledown # create file test.properties mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest#testTxStatelessServerSecondPrepareJvmHaltScaleDownClient -Dconsole-log-level=DEBUG #test is disabled, enable it first test.properties xtf.openshift.url=<os4/crc cluster url> xtf.openshift.namespace=wip-namespace xtf.bm.namespace=wip-builds-namespace xtf.eap.72.image=docker-registry.upshift.redhat.com/kwills/eap-cd-openshift-rhel8:18.0-EAP7-1216 xtf.eap.72.properties.eap.imagestream.name=jboss-eap73-openshift xtf.eap.72.version=7.2.0.GA xtf.eap.properties.location=/opt/eap xtf.eap.72.templates.repo=git: //github.com/jboss-container-images/jboss-eap-7-openshift-image.git,git://github.com/jboss-container-images/redhat-sso-7-openshift-image.git xtf.eap.72.templates.branch=eap72,v7.3.0.GA xtf. operator .eap.image=registry-proxy.engineering.redhat.com/rh-osbs/jboss-eap-7-tech-preview-eap- operator :jb-eap-7.3- operator -rhel8-containers-candidate-48963-20191008130923 # specify correct path oc binary xtf.openshift.binary.path=<oc_binary_path> xtf.openshift.token=<user token>

    Description

      While testing tx recovery in OpenShift I see that scale down of client pod that has transaction in-doubt on it isn't successful when there is server pod that is part of transaction which isn't reachable

      Scenario:

      ejb client (app tx-client, pod tx-client-0):

      • EJB business method
        • lookup remote EJB
        • enlist XA resource 1 to transaction
        • enlist XA resource 2 to transaction
        • call remote EJB

      ejb server (app tx-server, pod tx-server-0):

      • EJB business method
        • enlist XA resource 1 to transaction
        • enlist XA resource 2 to transaction

      testTxStatelessServerSecondPrepareJvmHaltScaleDownClient

      Test workflow:

      • ejb server XA resource crashes JVM on tx-server pod
      • label "wildfly.org/operated-by-headless" of server pod is changed, which causes that server isn't reachable
      • tx-server pod is scaled down
      • tx-client pod is scaled down

      scale down of client pod hangs

      {"level":"info","ts":1570636701.0314589,"logger":"wildflyserver_controller","msg":"Scaling down statefulset by verification if pods are clean by recovery","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-client"}
      {"level":"info","ts":1570636701.0314867,"logger":"wildflyserver_controller","msg":"Statefulset was not scaled to the desired replica size 0 (current StatefulSet size: 1). Transaction recovery scaledown process has not cleaned all pods. Please, check status of the WildflyServer tx-client","StatefulSet.Namespace":"msimka-namespace","StatefulSet.Name":"tx-client"}
      {"level":"info","ts":1570636703.5918324,"logger":"wildflyserver_controller","msg":"Reconciling WildFlyServer","Request.Namespace":"msimka-namespace","Request.Name":"tx-client"}
      {"level":"info","ts":1570636703.5919785,"logger":"wildlfyserver_resources","msg":"Getting resource","WildFlyServer.Namespace":"msimka-namespace","WildFlyServer.Name":"tx-client","Resource.Name":"tx-client"}
      {"level":"info","ts":1570636703.5920458,"logger":"wildlfyserver_resources","msg":"Got resource","WildFlyServer.Namespace":"msimka-namespace","WildFlyServer.Name":"tx-client","Resource.Name":"tx-client"}
      {"level":"info","ts":1570636703.5921679,"logger":"wildflyserver_controller","msg":"Transaction recovery scaledown processing","Request.Namespace":"msimka-namespace","Request.Name":"tx-client","Pod Name":"tx-client-0","IP Address":"10.128.1.34","Pod State":"SCALING_DOWN_RECOVERY_DIRTY","Pod Phase":"Running"}
      {"level":"info","ts":1570636703.5922475,"logger":"wildflyserver_controller","msg":"Recovery properties at pod were already defined. Skipping server restart.","Request.Namespace":"msimka-namespace","Request.Name":"tx-client","Pod Name":"tx-client-0"}
      {"level":"info","ts":1570636703.5971646,"logger":"wildflyserver_controller","msg":"Executing recovery scan at tx-client-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-client","Pod IP":"10.128.1.34","Recovery port":4712}
      {"level":"info","ts":1570636709.162107,"logger":"wildflyserver_controller","msg":"Executing recovery scan at tx-client-0","Request.Namespace":"msimka-namespace","Request.Name":"tx-client","Pod IP":"10.128.1.34","Recovery port":4712}
      {"level":"info","ts":1570636714.7187033,"logger":"wildflyserver_controller","msg":"In-doubt transactions in object store","Request.Namespace":"msimka-namespace","Request.Name":"tx-client","Pod Name":"tx-client-0","Message":"WildFly Transaction Client data dir is not empty and scaling down of the pod 'tx-client-0' will be retried.Wildfly Transacton Client data dir path '/opt/eap/standalone/data/ejb-xa-recovery', output listing: 20005_00000000000000000000ffff0a80012251f788695d9e026b0000001374782d636c69656e742d30_00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\n"}
      

      Attachments

        Activity

          People

            jfinelli@redhat.com Manuel Finelli
            msimka@redhat.com Martin Simka
            Martin Simka Martin Simka
            Martin Simka Martin Simka
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: