Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-5255

(7.1.0) Lost message when using transaction session on subscriber/consumer.

    XMLWordPrintable

Details

    • Hide

      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git

      cd eap-tests-hornetq/scripts/

      git checkout refactoring_modules

      groovy -DEAP_VERSION=7.0.0.ER5 PrepareServers7.groovy

      export WORKSPACE=$PWD

      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap

      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap

      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap

      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap

      mkdir journal_A
      mkdir journal_B
      export JOURNAL_DIRECTORY_A=$WORKSPACE/journal_A
      export JOURNAL_DIRECTORY_B=$WORKSPACE/journal_B

      cd ../jboss-hornetq-testsuite/

      mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailbackTransAckTopic -DfailIfNoTests=false -Deap=7x | tee log

      Show
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_VERSION=7.0.0.ER5 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap mkdir journal_A mkdir journal_B export JOURNAL_DIRECTORY_A=$WORKSPACE/journal_A export JOURNAL_DIRECTORY_B=$WORKSPACE/journal_B cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailbackTransAckTopic -DfailIfNoTests=false -Deap=7x | tee log

    Description

      We have lost messages in one of our HA test scenario.

      Test scenario:
      Two servers are configured in replicated colocated topology.
      Clients (producer and subscriber) are connected to server1 sending and receiving messages from topic.
      Server1 is killed and failover is executed. Then server1 is started again and clients make failback back and continue sending and receiving messages.

      Result:
      In the end clients are stopped and sent messages are compared with received messages.
      In our case one message is missing on subscriber probably due to exception [1]. On server side are warnings [2] and [3] (which I'm not sure if are related with this issue)
      Comment in PagecursorProviderImpl on line 113 says: "// sanity check, this ([3]) should never happen unless there's a bug"
      According to traces the lost message is large and is the last but one message written to the journal on last page .

      Expected result: No message loss or duplicates. We also do not expect exceptions [1][3].
      In attachment are traces of both nodes configs and complete log from test.

      Similar issue was hit in Artemis test suite. In the test
      PagingTest#testDeleteQueueRestart consumer get exception [4] when he
      tries to commit received messages. On server side warning messages [5]
      appear. I execute the test on matrix job with 9 combinations and it
      always fail in one or two cases.

      [1]

      10:39:33,727 Thread-29 ERROR [org.jboss.qa.hornetq.apps.clients.SubscriberTransAck:439] RETRY receive for host: 127.0.0.1, Trying to receive message with counter: 1081
      javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41
      	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:350)
      	at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendACK(ActiveMQSessionContext.java:421)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.acknowledge(ClientSessionImpl.java:696)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.doAck(ClientConsumerImpl.java:1035)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.acknowledge(ClientConsumerImpl.java:702)
      	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:96)
      	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:38)
      	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:212)
      	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:119)
      	at org.jboss.qa.hornetq.apps.clients.SubscriberTransAck.receiveMessage(SubscriberTransAck.java:427)
      	at org.jboss.qa.hornetq.apps.clients.SubscriberTransAck.run(SubscriberTransAck.java:139)
      Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41]
      

      [2]

      WARN  [org.apache.activemq.artemis.core.server] (Thread-6 (ActiveMQ-client-netty-threads-1223857560)) AMQ222086: error handling packet ReplicationCommitMessage[type=96, channel=2, journalID=1, txAction='rollback'] for replication: ActiveMQUnsupportedPacketException[errorType=UNSUPPORTED_PACKET message=null]
      	at org.apache.activemq.artemis.core.journal.impl.FileWrapperJournal.appendRollbackRecord(FileWrapperJournal.java:215) [artemis-journal-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.journal.impl.JournalBase.appendRollbackRecord(JournalBase.java:172) [artemis-journal-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.journal.impl.FileWrapperJournal.appendRollbackRecord(FileWrapperJournal.java:51) [artemis-journal-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handleCommitRollback(ReplicationEndpoint.java:609) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handlePacket(ReplicationEndpoint.java:182) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:567) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:349) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:331) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1105) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      

      [3]

      WARN  [org.apache.activemq.artemis.core.server] (Thread-9 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@57e1a31c-1495685008)) Invalid messageNumber passed = PagePositionImpl [pageNr=69, messageNr=0, recordID=2147488813] on null: java.lang.IllegalStateException: Invalid messageNumber passed = PagePositionImpl [pageNr=69, messageNr=0, recordID=2147488813] on null
      	at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:114) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl.queryMessage(PageSubscriptionImpl.java:511) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getPagedMessage(PagedReferenceImpl.java:64) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getMessage(PagedReferenceImpl.java:52) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.server.impl.RefsOperation.decrementRefCount(RefsOperation.java:163) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterCommit(RefsOperation.java:156) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterCommit(TransactionImpl.java:461) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$100(TransactionImpl.java:34) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$2.done(TransactionImpl.java:273) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:201) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:100) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_05]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_05]
      	at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_05]
      

      [4]

      org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119014:
      Timed out after waiting 30,000 ms for response when sending packet 43
           at
      org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:368)
           at
      org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.simpleCommit(ActiveMQSessionContext.java:301)
           at
      org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:509)
           at
      org.apache.activemq.artemis.tests.integration.client.PagingTest.testDeleteQueueRestart(PagingTest.java:870)
      

      [5]

      06:01:20,799 WARN  [org.apache.activemq.artemis.core.server] Invalid
      messageNumber passed = PagePositionImpl [pageNr=344, messageNr=8,
      recordID=21496] on null: java.lang.IllegalStateException: Invalid
      messageNumber passed = PagePositionImpl [pageNr=344, messageNr=8,
      recordID=21496] on null
           at
      org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:117)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl.queryMessage(PageSubscriptionImpl.java:532)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getPagedMessage(PagedReferenceImpl.java:66)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getMessage(PagedReferenceImpl.java:53)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.server.impl.RefsOperation.decrementRefCount(RefsOperation.java:163)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.server.impl.RefsOperation.afterCommit(RefsOperation.java:156)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterCommit(TransactionImpl.java:488)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$100(TransactionImpl.java:34)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$2.done(TransactionImpl.java:287)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:209)
      [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:100)
      [artemis-core-client-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
           at
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      [rt.jar:1.8.0_71]
           at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      [rt.jar:1.8.0_71]
           at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_71]
      

      This issue is still under investigation

      Note: hit also with 6.4.0.DR11, severity high - no customer case until today, thus not marked as blocker.

      Customer impact: Customer may loose important data which was contained inside the message.

      Attachments

        Issue Links

          Activity

            People

              rh-ee-ataylor Andy Taylor
              rhn-cservice-bbaranow Bartosz Baranowski
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: