Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: AMQ 7.4.4.CR1
Affects Version/s: AMQ 7.5.0.GA, AMQ 7.4.1, AMQ 7.4.2.GA
Component/s: broker-core, journal
Labels:
- Regression

QE Test Coverage:
-
Release Note Text:

Hide
Previously, if you had a live-backup broker pair configured for high availability using shared store, activation of the backup broker upon shutdown of the live broker could fail. Specifically, this situation occurred if the shared store had previously been disconnected and reconnected, before shutdown of the live broker. This issue is now resolved.

Show
Previously, if you had a live-backup broker pair configured for high availability using shared store, activation of the backup broker upon shutdown of the live broker could fail. Specifically, this situation occurred if the shared store had previously been disconnected and reconnected, before shutdown of the live broker. This issue is now resolved.
Release Note Status:
Documented as Resolved Issue
Target Release:

AMQ 7.4.4.GA
Upstream Jira:
https://issues.apache.org/jira/browse/ARTEMIS-2441, https://issues.apache.org/jira/browse/ARTEMIS-2567
Verified:
Verified in a release
Git Pull Request:
https://github.com/rh-messaging/activemq-artemis/pull/328
Steps to Reproduce:
Hide

To reproduce this issue, I used a dual-homed broker host with one interface sharing a network (192.168.100.0) with the master broker and used for cluster communications and a second interface (10.0.0.0) used for communication with the NFS server.

Mount options for the share are as below:

10.0.0.10:/var/nfs on /opt/nfs type nfs4 (rw,sync,lookupcache=none,actimeo=0,noac,soft,addr=10.0.0.10,clientaddr=10.0.0.11

To trigger the issue, I started both master and slave brokers and waited for the master to go live and the slave to announce as backup. After brokers were up, I triggered a 1 minute (61 seconds) interruption in the network interface between the slave broker and the nfs server:

#!/bin/bash sleep 2 echo "Interrupting network" ip link set eth1 down sleep 61 ip link set eth1 up

After the script completes, stop the master broker.

The slave logs the connection failure with the master and tries to start, with the resultant stack trace from the description.

I could not reproduce the issue on the non-LTS 7.5.0 release.
Show
To reproduce this issue, I used a dual-homed broker host with one interface sharing a network (192.168.100.0) with the master broker and used for cluster communications and a second interface (10.0.0.0) used for communication with the NFS server. Mount options for the share are as below: 10.0.0.10:/ var /nfs on /opt/nfs type nfs4 (rw,sync,lookupcache=none,actimeo=0,noac,soft,addr=10.0.0.10,clientaddr=10.0.0.11 To trigger the issue, I started both master and slave brokers and waited for the master to go live and the slave to announce as backup. After brokers were up, I triggered a 1 minute (61 seconds) interruption in the network interface between the slave broker and the nfs server: #!/bin/bash sleep 2 echo "Interrupting network" ip link set eth1 down sleep 61 ip link set eth1 up After the script completes, stop the master broker. The slave logs the connection failure with the master and tries to start, with the resultant stack trace from the description. I could not reproduce the issue on the non-LTS 7.5.0 release.

SFDC Cases Counter:
SFDC Cases Links:

Description

In a shared-store configuration, if the slave broker loses communication with the NFS service and the connection is restored, a subsequent failover to the slave results in a failed start of the broker with:

2020-02-19 15:28:32,076 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.io.IOException: Input/output error
	at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) [rt.jar:1.8.0_232]
	at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52) [rt.jar:1.8.0_232]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220) [rt.jar:1.8.0_232]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) [rt.jar:1.8.0_232]
	at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:735) [rt.jar:1.8.0_232]
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:721) [rt.jar:1.8.0_232]
	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.getState(FileLockNodeManager.java:256) [artemis-server-2.9.0.redhat-00009.jar:2.9.0.redhat-00009]
	at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:135) [artemis-server-2.9.0.redhat-00009.jar:2.9.0.redhat-00009]
	at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77) [artemis-server-2.9.0.redhat-00009.jar:2.9.0.redhat-00009]
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3738) [artemis-server-2.9.0.redhat-00009.jar:2.9.0.redhat-00009]

The journal store is visible on the slave host and restarting the slave broker results in a normal startup in live mode.

Attachments

Issue Links

is related to

ENTMQBR-2147 (7.2.z) Backup doesn't activate after shared store is reconnected

Closed

relates to

ENTMQBR-3213 Failback does not work master/slave cluster using NFS shared store

Closed

Activity

People

Assignee:: Domenico Francesco Bruscino

Reporter:: Duane Hawkins

Tester:: Tiago Bueno

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2020/02/19 3:38 PM

Updated:: 2023/09/07 11:52 PM

Resolved:: 2020/07/14 3:51 PM