Loading...

Details

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: AMQ 7.7.0.CR4
Component/s: core-jms-client
Labels:
None

Blocked:
False
Ready:
False
Release Note Text:
Undefined
Steps to Reproduce:
Hide

I'm attaching broker configuration files and my client's trace logs. Note that I'm using SIGSTOP/SIGCONT to simulate the freeze, but the end result is the same as the customer (which is just taking a heap dump on the master).

# master (host0 61616), slave (host1 61617) (tcp://127.0.0.1:61616,tcp://127.0.0.2:61617)?ha=true&retryInterval=1000&retryIntervalMultiplier=1.0&reconnectAttempts=-1 # start master and slave brokers rm -rf data log && bin/artemis-service start && tail -f log/artemis.log # start the consumer application mvn clean compile exec:java -Pcon # pause master's JVM process PID=$(ps -e | grep [h]ost0 | awk '\{print $1}'); kill -SIGSTOP $PID # wait for consumer to failover to slave # release master's JVM process PID=$(ps -e | grep [h]ost0 | awk '\{print $1}'); kill -SIGCONT $PID # at this point we have a split-brain as there is no quorum to mitigate # stop the slave process # consumer failback to master (CheckpointA) # restart the master # consumer is able to reconnect without restart (CheckpointB)

In my tests CheckpointA is failing right after logging Reconnection successful with the exception AMQ219013: Timed out waiting to receive cluster topology. Group:null" CheckpointB is also failing but there is no exception logged. Consumer logs "Reconnection successful", but no message is received.
Show
I'm attaching broker configuration files and my client's trace logs. Note that I'm using SIGSTOP/SIGCONT to simulate the freeze, but the end result is the same as the customer (which is just taking a heap dump on the master). # master (host0 61616), slave (host1 61617) (tcp: //127.0.0.1:61616,tcp://127.0.0.2:61617)?ha= true &retryInterval=1000&retryIntervalMultiplier=1.0&reconnectAttempts=-1 # start master and slave brokers rm -rf data log && bin/artemis-service start && tail -f log/artemis.log # start the consumer application mvn clean compile exec:java -Pcon # pause master's JVM process PID=$(ps -e | grep [h]ost0 | awk '\{print $1}' ); kill -SIGSTOP $PID # wait for consumer to failover to slave # release master's JVM process PID=$(ps -e | grep [h]ost0 | awk '\{print $1}' ); kill -SIGCONT $PID # at this point we have a split-brain as there is no quorum to mitigate # stop the slave process # consumer failback to master (CheckpointA) # restart the master # consumer is able to reconnect without restart (CheckpointB) In my tests CheckpointA is failing right after logging Reconnection successful with the exception AMQ219013: Timed out waiting to receive cluster topology. Group:null" CheckpointB is also failing but there is no exception logged. Consumer logs "Reconnection successful", but no message is received.

SFDC Cases Counter:
SFDC Cases Links:

Description

When the master broker "freezes" (caused by a heap dump) in a single-node master-slave replication setup, clients failover to the slave broker as expected. When the master "unfreezes", we have a split-brain, this is also expected. At this point, if we stop the slave broker, JMS client try to failback to master broker after N retries. Now, they are able to connect, but soon after they all fail with the following error (all new connections get the same exception):

javax.jms.JMSException: Failed to create session factory
	at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:886) ~[artemis-jms-client-2.13.0.redhat-00006.jar:2.13.0.redhat-00006]
	at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:299) ~[artemis-jms-client-2.13.0.redhat-00006.jar:2.13.0.redhat-00006]
	at it.fvaleri.integ.ApplicationUtil.openConnection(ApplicationUtil.java:58) ~[classes/:?]
	at it.fvaleri.integ.Application.<init>(Application.java:15) [classes/:?]
	at it.fvaleri.integ.Application.main(Application.java:31) [classes/:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_252]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_252]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
	at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282) [exec-maven-plugin-1.6.0.jar:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ219013: Timed out waiting to receive cluster topology. Group:null
	at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:712) ~[artemis-core-client-2.13.0.redhat-00006.jar:2.13.0.redhat-00006]
	at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.createConnectionInternal(ActiveMQConnectionFactory.java:884) ~[artemis-jms-client-2.13.0.redhat-00006.jar:2.13.0.redhat-00006]
	... 10 more

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

broker.xml.host0
6 kB
2020/11/25 11:26 AM
broker.xml.host1
6 kB
2020/11/25 11:26 AM
client.log
18 kB
2020/11/25 11:26 AM
jms-client.tar.gz
6 kB
2020/12/10 3:26 AM

Issue Links

is cloned by

ENTMQBR-4416 Empty broker topology returned to the clients

Closed

JMS client unable to reconnect after master freeze

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates