Uploaded image for project: 'WildFly Transaction Client'
  1. WildFly Transaction Client
  2. WFTC-85

XAResourceRegistry record needs to be clean immediatelly after commit is called even during recovery

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 1.1.13.Final
    • 1.1.12.Final
    • None
    • Hide

      There is test in OpenShift testsuite (see step for reproduce at WFLY-12922)

      mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest#testTxStatelessServerSecondCommitThrowRmFail -Dconsole-log-level=DEBUG
      

      There is an integration test in EAP QE crashrec testsuite

      git clone git@gitlab.mw.lab.eng.bos.redhat.com:jbossqe-eap/tests-transactions.git
      mvn clean verify -am -pl jbossts -DfailIfNoTests=false -Djbossts.noJTS -Djboss.dist=$JBOSS_HOME -Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit
      

      A new testcase for WildFly manualmode integration testcase needs to be added (most probably under org.jboss.as.test.manualmode.ejb.client.outbound.connection.transaction.preparehalt.TransactionPropagationPrepareHaltTestCase

      Show
      There is test in OpenShift testsuite (see step for reproduce at WFLY-12922 ) mvn clean test -P72 -Dtest=EjbTxnRemotingScaleDownTest#testTxStatelessServerSecondCommitThrowRmFail -Dconsole-log-level=DEBUG There is an integration test in EAP QE crashrec testsuite git clone git@gitlab.mw.lab.eng.bos.redhat.com:jbossqe-eap/tests-transactions.git mvn clean verify -am -pl jbossts -DfailIfNoTests= false -Djbossts.noJTS -Djboss.dist=$JBOSS_HOME -Dtest=TxPropagationJMSCrashRecoveryTestCase#injectRmfailAtServerCommit A new testcase for WildFly manualmode integration testcase needs to be added (most probably under org.jboss.as.test.manualmode.ejb.client.outbound.connection.transaction.preparehalt.TransactionPropagationPrepareHaltTestCase

      When transaction recovery runs commit on the SubordinateXAResource then such resource is created from the SerializedXAResource which was firstly saved to Narayana object store and the deserialized during recovery. As the SerializedXAResource does know nothing about the XAResourceRegistry then the persistent record is not removed and it will be removed in some of the next recovery cycles. That could be late for OpenShift scaledown processing.

      More details

      OpenShift transaction scale-down processing
      This issue is related to OpenShift scale down transaction recovery processing.
      It's handling which is part of the WildFly Operator. When there is the WildFly deployment asked to scale down - which means that one pod should be shutdown and WildFly instance should be stopped - then before the WildFly is stopped it is needed to finish all unfinished transactions. The WFLY instance is put to stale and the transaction recovery manager is triggered (it could be several times) to finish all transactions.
      If there is a transaction which consists of a remote EJB call to another WildFly instance then Narayana asks the WFTC to process the handling. The WFTC saves data about EJB remote transaction processing as file records which are formed in file system XAResourceRegistry (https://github.com/wildfly/wildfly-transaction-client/blob/1.1.12.Final/src/main/java/org/wildfly/transaction/client/provider/jboss/FileSystemXAResourceRegistry.java).
      The WildFly Operator permits to stop the WildFly instance only if there are no records in Narayana object store (transaction storage) and if there is no data in file system XAResourceRegistry.
      The rule for XAResourceRegistry is here because of a situation when WFTC sends a "prepare" call to a remote WildFly server and the original WildFly crashes. In such situations the WFTC remote call has prepared but Narayana has not saved any record to the object store (Narayana saves when all participants finish successfully the prepare phase). On restarting WildFly Narayana asks the WFTC for "prepare" in-doubt resources and if there are some then Narayana calls rollback.

      The issue
      But the trouble of this issue is a little bit different. Let's say the original WildFly instance fails to commit the remote EJB with WFTC (e.g. there is an intermittent network issue). Then the recovery process tries to commit WFTC in a while again. Let's say the remote WildFly server is in process of scale down - OpenShift tries to stop the WildFly. The remote server has unfinished data in Narayana object store and it waits for the originator to be informed about the transaction outcome.
      The network is healed, the WFTC commit is processed, the remote WildFly server is clean from unfinished transactions and is stopped.
      This works fine but when the WFTC process commits the XAResourceRegistry record is not removed. Removal of the registry record may happen on the next "recover" call which is (based on the periodicity of recovery processing) in 2 minutes.
      The next round of the recovery processing asks the WFTC for XAResourceRegistry records, there is returned this "undeleted" one, Narayana calls recover which fails as the remote WildFly is already stopped. Now there is this registry record hanging forever.
      Besides there are warning messages logged in server.log then if the original WildFly is asked for scale down it will never be scaled-down as there is a record in the WFTC registry.

      Why this happens
      The reason why this happens is that the XAResourceRegistry works in Narayana "bottom-up" recovery phase.
      Narayana recovery processing could be separated to two phases - "top-down' and "bottom-up". During "top-down" there are processed records which are saved in Narayana object store and they are tried to be committed and during "bottom-up" processing there are asked registered XARegistryHelpers (https://github.com/wildfly/wildfly-transaction-client/blob/1.1.12.Final/src/main/java/org/wildfly/transaction/client/provider/jboss/JBossLocalTransactionProvider.java#L98) to provide unfinished transactions (that's where Narayana asks the WFTC for "prepare" in-doubt resources).
      In this case there is taken this processing of "top-down" recovery where XAResource (https://github.com/wildfly/wildfly-transaction-client/blob/1.1.12.Final/src/main/java/org/wildfly/transaction/client/SerializedXAResource.java) was serialized into Narayana object store, then is deserialized and just committed (no registry is attached here: https://github.com/wildfly/wildfly-transaction-client/blob/1.1.12.Final/src/main/java/org/wildfly/transaction/client/SubordinateXAResource.java#L176).

      What is needed
      I would like to get the XAResourceRegistry record at time of commit. I would like to have a chance to match the data saved in the XAResourceRegistry with a commit call on the SubordinateXAResource originated from SerializedXAResource where no registry instance has been attached to.

      How the WFTC processing could be understood
      On enlisting the transaction the XARegistry record is created (https://github.com/wildfly/wildfly-transaction-client/blob/1.1.12.Final/src/main/java/org/wildfly/transaction/client/XAOutflowedResources.java#L60) - a new file is created. It consists of the transaction UID (Narayana provides the transaction Xid as part without the branch id, the branch id is unique for each particular participant). So the XARegistry record name is transaction id.
      When there is another XAResource part of the transaction is added under that XAResourceRegistry record as well.
      The removal of the file happens when no XAResource}}s are registered under the particular {{XAResourceRegistry record.
      The removal happens in normal circumstances when commit/rollback is called, or during recovery when there is no data received on XAResource.recover call or when recovery rolls-back the XAResource.
      As I mentioned above the recovery process and XAResource commit (XAResource based on the serialized form) is not covered.

            ochaloup@redhat.com Ondrej Chaloupka (Inactive)
            ochaloup@redhat.com Ondrej Chaloupka (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: