Uploaded image for project: 'JBoss Enterprise Application Platform 4 and 5'
  1. JBoss Enterprise Application Platform 4 and 5
  2. JBPAPP-5280

MessageSucker failures cause the delivery of the failed message to stall

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Major
    • TBD EAP 5
    • EAP_EWP 5.1.0, EAP_EWP 5.1.1 ER2
    • Messaging
    • None
    • Hide
      The MessageSucker is responsible for migrating messages between different members of a cluster. The <methodname>onMessage</methodname> routine attempts to deliver messages to the local queue, among other tasks. If delivery failed, messages appeared to be lost. They were still in the database, but it was difficult to redeliver them.

      This was a defect in the JBoss Messaging component, and has been fixed in that component. The fix is complex, and is discussed in JBMESSAGING-1822, which is available at <ulink url="https://issues.jboss.org/browse/JBMESSAGING-1822" />. In short, a new layer of reliability has been added to the message delivery logic.
      Show
      The MessageSucker is responsible for migrating messages between different members of a cluster. The <methodname>onMessage</methodname> routine attempts to deliver messages to the local queue, among other tasks. If delivery failed, messages appeared to be lost. They were still in the database, but it was difficult to redeliver them. This was a defect in the JBoss Messaging component, and has been fixed in that component. The fix is complex, and is discussed in JBMESSAGING-1822, which is available at <ulink url=" https://issues.jboss.org/browse/JBMESSAGING-1822 " />. In short, a new layer of reliability has been added to the message delivery logic.
    • Documented as Resolved Issue

    Description

      The MessageSucker is responsible for migrating messages between different members of a cluster, it is a consumer to the remote queue from which it receives messages destined for the queue on the local cluster member.

      The onMessage routine, at its most basic, does the following

      • bookkeeping for the incoming message, including expiry
      • acknowledge the incoming message
      • attempt to deliver to the local queue

      When the delivery fails, the result is the appearance of lost messages. Those messages which are processed during the failure are not redelivered, but they still exist in the database.

      The only way I have found to trigger the redelivery of those messages is to redeploy the queue containing the messages and/or restart that app server. Obviously neither approach is acceptable.

      In order to trigger the error I created a SOA cluster which only shared the JMS database, and no other. I modified the helloworld quickstart to display a counter of messages consumed, clustered the esb queue, and then used byteman to trigger the faults.

      The byteman rule is as follows, the quickstart will be attached.

      RULE throw every fifth send
      INTERFACE ProducerDelegate
      METHOD send
      AT ENTRY
      IF callerEquals("MessageSucker.onMessage", true) && (incrementCounter("throwException") % 5 == 0)
      DO THROW new IllegalStateException("Deliberate exception")
      ENDRULE

      This results in an exception being thrown for every fifth message. Once the delivery has quiesced, examine the JBM_MSG and JBM_MSG_REF tables to see the messages which have not been delivered.

      The clusters are ports-default and ports-01, the client seeds the gateway by sending 300 messages to the default.

      Adding up the counter from each server plus the message count from JBM_MSG results in 300 (or multiples thereof for more executions).

      Attachments

        Activity

          People

            gaohoward Howard Gao
            kconner@redhat.com Kevin Conner (Inactive)
            Misty Stanley-Jones Misty Stanley-Jones (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: