Uploaded image for project: 'JBoss A-MQ'
  1. JBoss A-MQ
  2. ENTMQ-876

Network connector with masterslave: connection does not fail over when network connectivity is interrupted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • JBoss A-MQ 6.2
    • JBoss A-MQ 6.1
    • None
    • Hide

      . Install three instances of JBoss ActiveMQ 6.1.0 brokers on distinct virtual machines. I will call the brokers A (the upstream broker) and B1, B2 (the two downstream brokers).
      2. Start B1 and B2, and attach JMS consumers such that they will consume all messages on some specified destination
      3. Create a network connector in broker A, as follows:

      <networkConnector
           name="testbridge"
           uri="masterslave:(tcp://B1:61616,tcp://B2:61616)"
                                 networkTTL="1"
                                 userName="admin"
                                 password="admin"
                                 duplex="false"
                                 prefetchSize="1000">
      </networkConnector>
      

      4. Start broker A; perhaps check its logs to ensure that the network connector is up
      5. Attach a producer to A that will post any kind of message at full speed to the destination on which the consumers are listening
      6. Messages should be consumed by the consumer on B1, as B1 is first in the masterslave: URL
      7. Shut down B1 cleanly, and verify that the consumer on B2 is now receiving messages
      8. Start up B1
      9. Restart broker A, just to ensure that the system is stable
      10. Ensure that messages are now passing to one or other consumer
      11. Shut down the network interface on the host of the broker that is receiving messages
      12. Note that message consumption does not switch to the other downstream broker. Messages accumulate on the upstream broker because it does not fail over
      13. Wait. The upstream broker does not fail over, however long it waits

      Show
      . Install three instances of JBoss ActiveMQ 6.1.0 brokers on distinct virtual machines. I will call the brokers A (the upstream broker) and B1, B2 (the two downstream brokers). 2. Start B1 and B2, and attach JMS consumers such that they will consume all messages on some specified destination 3. Create a network connector in broker A, as follows: <networkConnector name= "testbridge" uri= "masterslave:(tcp: //B1:61616,tcp://B2:61616)" networkTTL= "1" userName= "admin" password= "admin" duplex= " false " prefetchSize= "1000" > </networkConnector> 4. Start broker A; perhaps check its logs to ensure that the network connector is up 5. Attach a producer to A that will post any kind of message at full speed to the destination on which the consumers are listening 6. Messages should be consumed by the consumer on B1, as B1 is first in the masterslave: URL 7. Shut down B1 cleanly, and verify that the consumer on B2 is now receiving messages 8. Start up B1 9. Restart broker A, just to ensure that the system is stable 10. Ensure that messages are now passing to one or other consumer 11. Shut down the network interface on the host of the broker that is receiving messages 12. Note that message consumption does not switch to the other downstream broker. Messages accumulate on the upstream broker because it does not fail over 13. Wait. The upstream broker does not fail over, however long it waits

    Description

      A broker is configured to forward messages to two other brokers, using a network connector specified using a masterslave: URL. Messages placed on the upstream broker are forwarded to one of the two downstream brokers, according to the order specified in the URL. When one of the downstream brokers fails or is shut down (such that the JVM is no longer running), then the upstream broker detects the failure immediately, and switches to using the other broker.

      However, when there is a failure of network connectivity between the upstream and active downstream broker, then the upstream broker does not always respond correctly to the failure. It detects the failure, because we see a message in the log if the log level is high enough:

      17:10:01,126 | DEBUG | r ReadCheckTimer | AbstractInactivityMonitor | 131 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610394 | No message received since last read check for tcp:///10.5.1.17:61617@15034. Throwing InactivityIOException.

      However, the upstream broker does not always switch over.

      The problem is not fully reproducible. It seems that with the default prefetch size on the network connector (1000) messages it is reproducible so long as there is a continuous flow of messages through the installation. If the message flow is slower, or bursty, or the prefetch is set to a smaller number, then it is less reproducible. With a prefetch size of 1, it does not appear to be reproducible at all in my tests.

      However, the problem is not simply that one downstream broker has prefetched its quota and then gone away, leaving no messages for the other. I can put tens of thousands of messages on the upstream broker, and see no attempt to forward anything to the downstream broker that is still running.

      Most bizarrely of all, sometimes I can stop a downstream broker, wait a minute, and then restart it – and only then see the upstream broker fail over to the other downstream brokers. It's as if the upstream broker knows it has to fail over (because we see the message in the log), but some network operation against the disconnected downstream broker is blocking it for some reason.

      Attachments

        1. activemq_may01_2.log
          471 kB
        2. activemq_may01.log
          707 kB
        3. jconsole_ss.png
          jconsole_ss.png
          127 kB
        4. log_from_bridge_broker.txt
          388 kB
        5. may5.zip
          7 kB
        6. stack_dump_after_active_broker_killed.txt
          51 kB

        Activity

          People

            gtully@redhat.com Gary Tully
            rhn-support-kboone Kevin Boone
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: