Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2891

Gap in time between commit of transaction and actual value update

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Major
    • None
    • 5.2.1.Final
    • Transactions
    • None

    Description

      Since upgrading our AS7.2 dependency in Immutant (transitively pulling in 5.2.1.Final), one of our integration tests has begun failing intermittently on our CI server. We've yet to see the failure in local runs, only on CI, so I suspect there's something racist going on.

      The two tests (one for optimistic locking, the other for pessimistic) integrate an Infinispan cache (on which the Immutant cache is built) with HornetQ and XA transactions. A number of queue listeners respond to messages by attempting to increment a value in the cache. The failure occurs with both locking schemes, but much more often with optimistic.

      We've confirmed the failure on 5.2.2 as well.

      Attached you'll find three traces of the optimistic test: the good, the bad, and the ugly. All three correspond to this test: https://github.com/immutant/immutant/blob/31a2ef6222088ccb828898e9e3e4531e0333d517/integration-tests/apps/caching/counter/test/counter/test/locking.clj

      So you can correlate the log messages prefixed with "JC:" in the traces to the code. Note in particular the last two lines in locking.clj: a logged message containing the count, and then an assertion of the count. Note that the "bad" trace was an actual failing test, but the "ugly" trace was a successful test, even though the trace clearly shows the count logged as 2, not 3. The Infinispan TRACE output clearly shows the value as 3, hence the ugliness of this test.

      It's important to understand that the "work" function occurs within an XA transaction. This means, as I understand it, that if three messages are published to "/queue/done", the cached count should equal 3. Line #44 in locking.clj will block until it receives 3 messages, after which the cached count should be 3.

      These tests always pass locally. They only ever fail on CI, which runs very slowly.

      Attachments

        1. bad.log
          109 kB
        2. good.log
          110 kB
        3. pessimistic.log
          112 kB
        4. ugly.log
          110 kB

        Activity

          People

            rh-ee-galder Galder ZamarreƱo
            jcrossle@redhat.com Jim Crossley
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: