Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-17894

Documentation about transaction heuristic outcomes is wrong

    XMLWordPrintable

Details

    Description

      The section on clearing expired and heuristic transactions in the JBoss EAP documentation (https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.2/html-single/managing_transactions_on_jboss_eap/,
      revision 46e9b7e8965844814ed498bf3c4ffe504049ce4b:en-us, from date Revised on 2019-09-26)
      is not precise and it's wrong at some places. Please, change it in the following manner.

      Section 5.4. Clearing Up Expired Transactions

      Task: The section talks and name the objects which are created as `ExpiryEntryMonitor`s.
      That's misguiding as the text talks about expiry scanners and the code of Narayana does not know term `ExpiryEntryMonitor`.
      Narayana code base knows term `ExpiredEntryMonitor` but it's the name for the thread which manages the `ExpiryScanner`s.
      It's not the item that could be configured via a system property.
      Change: change the term `ExpiryEntryMonitor` for `ExpiryScanner` and `ExpiryEntryMonitor thread` for `ExpiredEntryMonitor thread`

      Task: it should be clarified what is the expired transaction. There is no information about this topic. Please add the following text.
      Text: The expired transaction is such transaction which stays in the object store for a long time.
      The time when the transaction is considered to be expired is configurable.
      By default, the time when a transaction is considered as expired is 12 hours.
      By default, WildFly configures no expiry scanner object which would be working with the transactions.
      This needs to be configured via system properties (see below).

      The ExpiryScanner is an abstraction which handles objects saved in the object store.
      The most known objects which are stored in the object stores are transactions.
      But Narayana transaction manager stores different types of objects in the object store too.

      Currently, the Narayana provides several expiry scanners, every for a different object type and with different functionality.

      Task: Some sentences are misguiding and wrong

      Change paragraph:
      ```
      ExpiryEntryMonitor

      When the Recovery Manager initializes an expiry scanner thread, the ExpiryEntryMonitor object is created,
      which is used to remove dead items from the object store.
      A number of scanner modules are loaded dynamically, which removes the dead items for a particular type.

      You can configure the scanner modules in the properties file using the RecoveryEnvironmentBean.expiryScanners system property.
      The scanner modules are loaded at the time of initialization.
      ```

      To new text:
      ```
      ExpiredScanners
      Recovery Manager starts an expiry scanner thread which is a daemon thread named as ExpiredEntryMonitor.
      The thread runs the ExpiryScanner and manages expired objects in the object store.
      Every ExpiryScanner is responsible for a different object type and has a different strategy for handling
      the expired objects.

      You can configure the expiry scanners using the RecoveryEnvironmentBean.expiryScanners system property.
      ```

      Paragraph:
      ```
      All the scanner modules are called periodically to scan for dead items by the ExpiryEntryMonitor thread.
      You can configure this period, in hours, using the expiryScanInterval system property, as shown in the example below:
      ...
      All scanner modules inherit the same behaviour from the ExpiryScanner interface.
      This interface provides a scan method that is implemented by all the scanner modules, including the following. The scanner thread calls this scan method.
      ```

      New text:
      ```
      All the scanner modules are called periodically to scan for expired objects by the ExpiredEntryMonitor thread.
      You can configure this period, in hours, using the expiryScanInterval system property, as shown in the example below.
      The scan is run at the startup and then periodically repeated in a defined interval. If the interval is defined as a negative number
      the initial scan is skipped and the first check is run only after the interval passes.
      ...
      <<remove the sentence from this place from the documentation>>
      ```

      Paragraph
      ```
      ExpiredTransactionStatusManagerScanner
      The ExpiredTransactionStatusManagerScanner removes the dead TransactionStatusManagerItems from the object store.
      These items remain in the object store for a certain period before they are deleted, which is 12 hours by default. You can configure this time period, in hours, using the transactionStatusManagerExpiryTime system property as shown in the example below:
      ```

      New text:
      ```
      ExpiredTransactionStatusManagerScanner
      The ExpiredTransactionStatusManagerScanner removes the dead TransactionStatusManagerItems from the object store.
      These items represent a transaction runtime data which is a piece of cached information on how to connect to Narayana internal status manager object.
      These items remain in the object store for a certain period before they are deleted, which is 12 hours by default. You can configure this time period, in hours, using the transactionStatusManagerExpiryTime system property as shown in the example below:
      ```

      Paragraph:
      ```
      AtomicActionExpiryScanner
      The AtomicActionExpiryScanner moves transaction logs for AtomicActions that are assumed to have completed.
      For example, if a failure occurs after a participant has been told to commit but before the transactions subsystem can update the logs,
      then upon recovery the JBoss EAP transaction manager attempts to replay the commit request. This replay will obviously fail,
      thus preventing the log from being removed. The AtomicActionExpiryScanner is also used when logs cannot be recovered automatically
      for reasons such as being corrupt or zero-length. All logs are moved to a specific location based on the old location appended with /Expired.
      ```

      New text:
      ```
      AtomicActionExpiryScanner
      The AtomicActionExpiryScanner moves transaction logs for AtomicActions that are assumed to have completed.
      This object type represents the transactions as they are known from the JTA specification when the Narayana JTA mode is used.
      For example, if a failure occurs after a participant has been told to commit but before the transactions subsystem can update the logs,
      then upon recovery the JBoss EAP transaction manager attempts to replay the commit request. This replay will obviously fail,
      as the commit is called from the second time and it was already done, thus preventing the log from being removed.
      At such case, the log is left in the object store and warnings about an unknown object item are reported in the server log.
      The AtomicActionExpiryScanner is also used when logs cannot be recovered automatically
      for reasons such as being corrupt or zero-length. All logs are moved to a specific location based on the old location appended with /Expired.
      ```

      Add text:
      ```
      JTS processing may use some additional implementations of scanners

      ExpiredToplevelScanner
      Removes ArjunaTransactionImple/AssumedCompleteTransaction record from the object store.
      This object type represents the transactions as they are known from the JTA specification when Narayana JTS mode is used.
      The AssumedCompleteTransaction originates at the type ArjunaTransactionImple and is changed for the assumed type by the JTS periodic recovery processing.

      ExpiredServerScanner
      Removes ArjunaTransactionImple/AssumedCompleteServerTransaction record from the object store.
      This object type represents the subordinate transactions, e.g. imported by JCA, when Narayana JTS mode is used.
      The AssumedCompleteServerTransaction originates from the type ArjunaTransactionImple/ServerTransaction/JCA and is changed for the assumed type by the JTS periodic recovery processing.

      ExpiredContactScanner
      Scanner removes the records which let the recovery manager know what Narayana instance belongs to which JVM.
      This record is not connected with transaction processing and represents Narayana internal runtime data.
      ```

      Section 5.5. Recovering Heuristic Outcomes

      Task: terms 'HEURISTIC_COMMIT', 'HEURISTIC_ROLLBACK',... are not exceptions and are not used in JTA specification.
      They are representations of internal Narayana enum which is saved in the transaction log into object store.
      Change text:
      Heuristic completion throws one of the following heuristic outcome exceptions:
      -> For the heuristic outcome the participant record in the object store may obtain one of the following statuses:

      Task: Whole part about the statuses (which is wrongly presented as exceptions) are wrong. When the heuristic state was reached
      it's a heuristic - aka. unknown - state. We can't state in the documentation phrases like "In this case, you need not do anything because a consistent termination was reached."

      Paragraph:
      ```
      HEURISTIC_COMMIT
      This exception is thrown when the transaction manager decides to rollback, but somehow all the resources had already committed on their own. In this case, you need not do anything because a consistent termination was reached.
      HEURISTIC_ROLLBACK
      This exception implies that the resources have all done a rollback because the commit decision from the transaction manager was delayed. Similar to HEURISTIC_COMMIT, in this case also you need not do anything because a consistent termination was reached.
      HEURISTIC_HAZARD
      This exception occurs when the disposition of some of the updates is unknown. For those that are known, they have either all been committed or all rolled back.
      HEURISTIC_MIXED
      This exception occurs when some parts of the transaction were rolled back while others were committed.
      ```

      New text:
      ```
      HEURISTIC_COMMIT
      The participant was asked for rollback but it was committed instead. We may assume that the transaction was rolled-back as a whole
      but this participant made a different decision about its outcome.
      HEURISTIC_ROLLBACK
      The participant was asked for commit but it was rolled-back instead. We may assume that the transaction was committed as a whole
      but this participant made a different decision about its outcome.
      HEURISTIC_HAZARD
      An unknown error happens during the processing and the participant was either committed or rolled-back or none of those two.
      HEURISTIC_MIXED
      The participant may comprises several sub-participants forming a kind of a tree structure. These sub-participants are named branches.
      If transaction manager commands the participant to commit then it calls commit on all its branches.
      For the heuristic mixed outcome we may say that the participant was both committed and rolled-back
      as some branches were committed and some were rolled-back.
      ```

      Adjust:
      This procedure shows how to handle a heuristic outcome of a transaction using the Java Transaction API (JTA).
      -> This procedure shows how to handle a heuristic outcome of a transaction using the JBoss EAP CLI commands.

      Task: remove the paragraph about transient failures. Transient failures can't cause the heuristic outcome for the transactions.
      The heuristic means there is some persistent failure in processing, not an intermittent outage or whatever.
      The transaction manager is capable to handle a transient failure.
      Remove this:
      Usually, if there is a transient failure in your environment, you will know about it before you find out about the heuristic error. This could be due to a network outage, hardware failure, database failure, power outage, or a host of other things.

      Change this:
      If you come across a heuristic outcome in a test environment during stress testing, it implies weaknesses in your test environment.
      -> If you come across a heuristic outcome in a test environment during stress testing, it implies a serious issue in the transaction processing and you should investigate deeper the logs of the resources and the transaction manager.

      Section 5.5.1. Guidelines on Making Decisions for Heuristic Outcomes

      Text addition:
      The documentation needs to add information that before CLI call `read-resource` there needs to be invoked `probe`

      ```
      Before reading information on participant status the transaction manager has to be probed() by calling
      /subsystem=transactions/log-store=log-store:probe()
      The follow-up read-resource commands read the snapshot of the object store retrieved by the probe call.

      You can use the read-resource operation to check the status of the participants in the transaction:
      ```

      Sub-section: Recovering the HEURISTIC_HAZARD Exception

      Text adjustments:

      Recovering the HEURISTIC_HAZARD Exception
      -> Recovering participants in the HEURISTIC_HAZARD or HEURISTIC_MIXED state

      A recoverable resource maintains all the information about the heuristic decision in stable storage until it is required by the transaction manager.
      -> A recoverable resource (e.g. a database or a JMS broker) maintains all the information about the heuristic decision in its own stable storage until it is required by the transaction manager.

      Heuristic outcomes are stored in the server log and can be identified using the resource manager and transaction manager.
      -> Heuristic outcomes are stored in the JBoss EAP object store and can be identified using the resource manager
      (which represents a link to the recoverable resource) and transaction manager.

      You must rather inspect the resource manager to know the state of the heuristic exception.
      -> You must rather inspect the resource manager to know the state of the participant and its branches.

      Running the recover operation changes the state of the transaction to PREPARE
      -> Running the recover operation changes the state of the participant from HEURISTIC to PREPARE

      You can verify this by running the probe operation on the log-store element again.
      -> You can verify this by running the probe operation on the log-store element again
      and then read-resource operation.

      Sub-section: Recovering the HEURISTIC_ROLLBACK and HEURISTIC_COMMIT Exceptions

      Text adjustments:

      Recovering the HEURISTIC_ROLLBACK and HEURISTIC_COMMIT Exceptions
      -> Recovering participants in the HEURISTIC_ROLLBACK and HEURISTIC_COMMIT states

      Text adjustment:

      If the heuristic outcome is a rollback type, then:

      • The resource should not be able to commit the transaction, provided the resource manager is well implemented.
      • You must decide whether you should delete the branch from the resource manager, using a forget call, so that the rest of the transaction can commit normally and be cleaned from the transaction store.
      • If you do not delete the branch from the resource manager, then the transaction will remain in the transaction store forever.
        On the other hand, if the heuristic outcome was a commit type, then you must use the business semantics to deal with the inconsistent outcome.
        ->
      • The resource should not be able to commit or rolled-back the transaction, provided the resource manager is well implemented.
      • You must decide whether you should delete the branch, if it exists, from the resource manager, using a forget call,
        so that the rest of the transaction can commit normally and be cleaned from the resource stable store.
      • If you do not delete the existing branch from the resource manager, then the transaction will remain in the resource stable store forever.
      • It may be necessary to use the business semantics to deal with the inconsistent outcome.

      Sub-section: Further Actions When Manual Reconciliation Fails

      Task: Remove the following paragraph as it's Oracle related and an unspecific:

      You can check the database transaction table, which is the DBA_2PC_PENDING table for Oracle. However, these will depend upon the specific resource managers. Transaction Manager can provide you with the branches to inspect in each resource manager.

      Text adjustment:

      You should consult the vendor's documentation on this resource manager for details.
      -> Verification of the branches and the resource stable store depend upon the specific resource managers.
      You should consult the vendor's documentation on the resource manager for details (see the reference links above).

      Attachments

        Activity

          People

            dsoni@redhat.com Dhruv Soni
            ochaloup@redhat.com Ondrej Chaloupka (Inactive)
            Marek Kopecky Marek Kopecky
            Marek Kopecky Marek Kopecky
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: