Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Can't Do
Priority: Major
Fix Version/s: None
Affects Version/s: JBossAS-4.0.1 Final
Component/s: Clustering, JMS (JBossMQ)
Labels:
None

Forum Reference:
http://community.jboss.org/thread/48063?tstart=0

SFDC Cases Counter:
SFDC Cases Links:

Description

I'm running into what looks like a bug using HA JMS. At some point while the cluster is running and working fine, the cluster starts shuffling things around. The singleton JMS destination moves from one box to another. I analyzed my log files from one particular occurrence of this bug, and the singleton service moved back and forth between 2 boxes in my 2-box cluster 4 times in the course of 45 minutes. There was no apparent reason for the moves. Neither box was under any kind of significant load, and both boxes were still connected to the network. If I invoke the showHistory operation on the DefaultPartition in the JMX console, I see the following, which is happening right around the time that stuff stops working:

8/19/05 3:37 AM : Node suspected: liven:38967 (additional data: 17 bytes)
8/19/05 3:37 AM : Node suspected: liven:38967 (additional data: 17 bytes)
8/19/05 3:37 AM : New view: [10.67.89.133:1099, 10.67.89.132:1099] with viewId: 3 (old view: [10.67.89.132:1099, 10.67.89.133:1099] )
8/19/05 3:37 AM : setState called on partition
8/19/05 4:22 AM : Node suspected: liven:39030 (additional data: 17 bytes)
8/19/05 4:22 AM : New view: [10.67.89.132:1099] with viewId: 0 (old view: [10.67.89.133:1099, 10.67.89.132:1099] )
8/19/05 4:22 AM : setState called on partition
8/19/05 4:22 AM : New view: [10.67.89.132:1099, 10.67.89.133:1099] with viewId: 5 (old view: [10.67.89.132:1099] )

It looks like in the first case, (at 3:37 AM), the cluster view changed its ordering. Then later at 4:22, one of the nodes completely left the view, but only for less than a minute, then it came back. So I came in this morning to find that all of the nodes in the cluster were connected to HA JMS fine, and saw 0 messages on the HA queue, with the HA queue being hosted on 10.67.89.132. If I look at the queue in the JMX console, its QueueDepth is 0. However, if I look at 10.67.89.133's JMX console, the QueueDepth for the HA queue is 227. So there are messages still sitting in the queue, but nobody is receiving them, because everyone is connecting to HA JMS, which is being served by 10.67.89.132, and the version of the HA queue on that box doesn't see any messages on the queue. I also verified this situation by connecting to HA JMS from an independent command-line client just now, and it sees 0 messages on the HA queue.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tim McCune (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2005/09/01 1:37 PM

Updated:: 2005/09/01 3:15 PM

Resolved:: 2005/09/01 2:05 PM