Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Obsolete
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.2.1.CR1
Component/s: ActiveMQ
Labels:
None

Target Release:

7.2.z.GA

SFDC Cases Counter:
SFDC Cases Links:

Description

Test scenario:

start two nodes in cluster in collocated HA topology with shared journal
- journal is located on local disk (not NFS/GFS2)
start producer and send messages to inQueue to node-1, wait for producer to finish
kill node-2 and start it again
Start consumer and consume messages from inQueue on node-1

Expected result:
node-2 will start and consumer will receive all messages

Actual result:
Sometimes happens that node-2 does not start after kill

Attaching logs and thread dump from node-2 which hangs during start.

Investigation:
It seems that sometimes happens that Artemis (live) in node-2 is not able to acquire lock on journal:

"ServerService Thread Pool -- 85" #159 prio=5 os_prio=0 cpu=160.92ms elapsed=235.68s tid=0x00007fe3b82a4000 nid=0x291 waiting on condition  [0x00007fe39d5ae000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(java.base@11.0.2/Native Method)
	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:308)
	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.startLiveNode(FileLockNodeManager.java:168)
	at org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:68)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:544)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:481)
	- locked <0x00000000d4ab7640> (a org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl)
	at org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.start(JMSServerManagerImpl.java:376)
	- locked <0x00000000d4ab7460> (a org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl)
	at org.wildfly.extension.messaging.activemq.jms.JMSService.doStart(JMSService.java:206)
	- locked <0x00000000d1706148> (a org.wildfly.extension.messaging.activemq.jms.JMSService)
	at org.wildfly.extension.messaging.activemq.jms.JMSService.access$000(JMSService.java:65)
	at org.wildfly.extension.messaging.activemq.jms.JMSService$1.run(JMSService.java:100)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.2/Executors.java:515)
	at java.util.concurrent.FutureTask.run(java.base@11.0.2/FutureTask.java:264)
	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1378)
	at java.lang.Thread.run(java.base@11.0.2/Thread.java:834)
	at org.jboss.threads.JBossThread.run(JBossThread.java:485)

Customer impact:
Server does not fully boot and it's not possible to get to original state after server crash. Manual intervention required.

Tested on RHEL 7 (JDK 8/11).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

NettyColocatedClusterFailoverTestCase.testKillInClusterSmallMessages.zip
374 kB
2019/03/28 11:02 AM
startup-thread-dump.txt
298 kB
2019/03/28 11:02 AM

Issue Links

is cloned by

ENTMQBR-2389 Sometimes server in collocated HA topology with shared store does not boot after kill and restart

Closed

Activity

People

Assignee:: Justin Bertram

Reporter:: Miroslav Novak

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2019/03/28 11:01 AM

Updated:: 2021/04/07 3:03 AM

Resolved:: 2021/04/07 3:03 AM