[WFWIP-23] [Artemis 2.x upgrade] Stuck messages in artemis.internal.sf.my-cluster... queue after restarting nodes in cluster - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Blocker
Component/s: Artemis
Labels:
- activemq
- feature-branch-blocker

Steps to Reproduce:
Hide

Steps to reproduce:

git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout f3de9baf1f8b39b810bf3d55d45f4341e3b2aa87 groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/258/artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ClusterTestCase#testStopStartCluster -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT -DfailIfNoTests=false | tee log
Show
Steps to reproduce: git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout f3de9baf1f8b39b810bf3d55d45f4341e3b2aa87 groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/258/artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ClusterTestCase#testStopStartCluster -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT -DfailIfNoTests= false | tee log

There are lost messages in scenario where nodes in cluster are cleanly stopped and started again. This issue was hit with Artemis 2.5.0.Final and WF Jeff's integration branch ~~WFLY-9407~~_upgrade_artemis_2.4.0_with_prefix.

Test Scenario:

start two servers in cluster (JGroups used for discovery)
send messages to testQueue0 on node-1 and node-2
wait until consumers on both nodes receive 300 messages
cleanly shut down 1st and then 2nd server
leave servers shut down for one minute
start both servers
wait until both consumers receive 500 messages
stop sending messages and receive all remaining messages

Pass Criteria: All send messages are received by consumer

Actual Result: There are lost messages.

Investigation:
There are lost messages which were sent to 2nd node. However they got stuck in queue .artemis.internal.sf.my-cluster.8a7e9e98-2c36-11e8-9737-fa163ea20b26 during load balancing to 1st server.

I'm attaching trace logs from client and servers and content of journal from 2nd server.

This is regression against Artemis 1.5.5 thus setting blocker priority.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

journal-node-2.txt
12.47 MB
2018/03/22 9:23 AM

blocks

WFLY-10320 Upgrade artemis from 1.5.x to 2.x.x

Closed

Assignee:: Jiri Ondrusek

Reporter:: Miroslav Novak

Tester:: Michal Toth

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2018/03/22 9:21 AM

Updated:: 2021/10/24 6:41 AM

Resolved:: 2018/05/31 8:59 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates