Uploaded image for project: 'Application Server 3  4  5 and 6'
  1. Application Server 3 4 5 and 6
  2. JBAS-2203

HA destination ends up on the wrong box

    XMLWordPrintable

Details

    • Bug
    • Resolution: Can't Do
    • Major
    • None
    • JBossAS-4.0.1 Final
    • Clustering, JMS (JBossMQ)
    • None

    Description

      I'm running into what looks like a bug using HA JMS. At some point while the cluster is running and working fine, the cluster starts shuffling things around. The singleton JMS destination moves from one box to another. I analyzed my log files from one particular occurrence of this bug, and the singleton service moved back and forth between 2 boxes in my 2-box cluster 4 times in the course of 45 minutes. There was no apparent reason for the moves. Neither box was under any kind of significant load, and both boxes were still connected to the network. If I invoke the showHistory operation on the DefaultPartition in the JMX console, I see the following, which is happening right around the time that stuff stops working:

      8/19/05 3:37 AM : Node suspected: liven:38967 (additional data: 17 bytes)
      8/19/05 3:37 AM : Node suspected: liven:38967 (additional data: 17 bytes)
      8/19/05 3:37 AM : New view: [10.67.89.133:1099, 10.67.89.132:1099] with viewId: 3 (old view: [10.67.89.132:1099, 10.67.89.133:1099] )
      8/19/05 3:37 AM : setState called on partition
      8/19/05 4:22 AM : Node suspected: liven:39030 (additional data: 17 bytes)
      8/19/05 4:22 AM : New view: [10.67.89.132:1099] with viewId: 0 (old view: [10.67.89.133:1099, 10.67.89.132:1099] )
      8/19/05 4:22 AM : setState called on partition
      8/19/05 4:22 AM : New view: [10.67.89.132:1099, 10.67.89.133:1099] with viewId: 5 (old view: [10.67.89.132:1099] )

      It looks like in the first case, (at 3:37 AM), the cluster view changed its ordering. Then later at 4:22, one of the nodes completely left the view, but only for less than a minute, then it came back. So I came in this morning to find that all of the nodes in the cluster were connected to HA JMS fine, and saw 0 messages on the HA queue, with the HA queue being hosted on 10.67.89.132. If I look at the queue in the JMX console, its QueueDepth is 0. However, if I look at 10.67.89.133's JMX console, the QueueDepth for the HA queue is 227. So there are messages still sitting in the queue, but nobody is receiving them, because everyone is connecting to HA JMS, which is being served by 10.67.89.132, and the version of the HA queue on that box doesn't see any messages on the queue. I also verified this situation by connecting to HA JMS from an independent command-line client just now, and it sees 0 messages on the HA queue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            javajedi_jira Tim McCune (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: