Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: EAP_EWP 5.1.2 ER2, EAP_EWP 5.1.2
Affects Version/s: EAP_EWP 5.1.1, EAP_EWP 5.1.2 CR1, EAP_EWP 5.1.2 CR3, EAP_EWP 5.1.2 CR4
Component/s: HornetQ
Labels:
None
Environment:

RHEL 6 x86-64 with GFS2/SAN

Release Note Text:

Hide
In a situation with clustered HornetQ instances where a cluster node has its journal disconnected - e.g. when the server loses its connection to the SAN - the other nodes did not take over in place of the failed node. This problem has now been fixed and failover from a failed HornetQ node now occurs without interruption to the client.

Show
In a situation with clustered HornetQ instances where a cluster node has its journal disconnected - e.g. when the server loses its connection to the SAN - the other nodes did not take over in place of the failed node. This problem has now been fixed and failover from a failed HornetQ node now occurs without interruption to the client.
Release Note Status:
Documented as Resolved Issue
Docs QE Status:
NEW

Description

Hi Clebert,

as we agreed we've started developing tests with disconnected journal according to HornetQ test plan (section 10). For now all test scenarios are failing because HornetQ architecture was not initially designed to handle such a situation. I'd like to share here current test results and some information about testing environment.

Test Scenario - "Node is disconnected from journal" - collocated backup (corresponds to section 10.1.1):
1. Start cluster - EAP servers A and B
2. Start "live" producer and "live" consumer connected to server A and sending messages to "liveQueue" - active for the whole duration of the test
3. Start producer - send 1000 messages to "testQueue" to server A
4. Disconnect SAN from server A
5. Start consumer - read from server B from "testQueue"

Pass criteria:
After step 4 the backup node will take its role.
Clients will be reconnected to backup node and will be able to continue with its work.

Test results:
After step 4.:

EAP server B won't take its role - backup doesn't come to live
"live" producer/consumer ends with exception - attached logs - and don't failover to EAP server B
In step 5. consumer on EAP node B is able to read only half of the messages sent in step 3. to "testQueue" (load- balancing)

Note about testing environment:
GFS2/SAN is using "fenced" daemon which power off nodes which failed. By disconnecting SAN this happens but it takes couple of minutes. Considering our test scenario after step 4 - New clients can connect to EAP server A and fail to read/send any messages. EAP server B just deliver messages which are in its journal when clients connect to it.

Do we have some solution already?

Thank you,

Mirek

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

jmsClient.zip
127 kB
2011/12/06 7:15 AM
live_consumer.log
5 kB
2011/09/16 4:07 AM
live_producer.log
10 kB
2011/09/16 4:07 AM
logs.zip
3.86 MB
2012/01/04 11:51 AM
logs.zip
1.03 MB
2011/12/06 5:58 AM
newJmsClient.zip
92 kB
2012/01/11 11:56 AM
reproducer.zip
9.04 MB
2011/12/06 5:58 AM
san_consumer_threaddump.txt
12 kB
2012/01/11 11:56 AM
serverA.log
79 kB
2011/09/16 4:07 AM
server-A-threaddump.txt
157 kB
2012/01/11 11:56 AM
serverB.log
37 kB
2011/09/16 4:07 AM
server-B-threaddump.txt
187 kB
2012/01/11 11:56 AM

Issue Links

is related to

JBPAPP-7870 Node disconnected from journal hangs on shutdown

Closed

Activity

People

Assignee:: Clebert Suconic

Reporter:: Miroslav Novak

Writer:: Russell Dickenson (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2011/09/15 9:10 AM

Updated:: 2012/01/12 9:16 AM

Resolved:: 2012/01/12 9:16 AM