Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-107

How does A-MQ handle idle timeouts and heartbeats?

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • None
    • None
    • None
    • Documentation (Ref Guide, User Guide, etc.)

    Description

      Is there a documentation that states how idle timeouts and heartbeats should work, with AMQP and also other protocols and client libraries?

      As far as I can tell, this is documented at https://activemq.apache.org/artemis/docs/1.2.0/connection-ttl.html but it does not go into details.

      Looking at the pull request, it seems to me in Artemis it works the same way as in RabbitMQ. In RabbitMQ, heartbeats are sent every idle_timeout / 2 and if two consecutive heartbeats are missed, it is considered a failure. https://www.rabbitmq.com/heartbeats.html

      In Qpid it worked the other way. Heartbeats are sent every idle_timeout seconds and if two consecutive are missed (after idle_timeout * 2), server terminates the connection. Some qpid-proton clients advertise to the server that their idle_timeout is 1/2 of what the programmer set it to be, some do not. https://bugzilla.redhat.com/show_bug.cgi?id=1151446

      According to AMQP 1.0 specification (2.4.5 Idle Timeout Of A Connection), connection is closed if no data are sent for idle_timeout milliseconds. "To avoid spurious timeouts, the value in idle-time-out SHOULD be half the peer’s actual timeout threshold." Heartbeats (empty frames) may be used, but specification does not say how often to send them.

      Some sort of heartbeat feature in WebSphere is configured using IBM_CS_FD_PERIOD and IBM_CS_FD_CONSECUTIVE_MISSED. Heartbeat is sent every IBM_CS_FD_PERIOD and if more than IBM_CS_FD_CONSECUTIVE_MISSED heartbeats are not received, it is considered a failure. https://www.ibm.com/support/knowledgecenter/SSTVLU_8.5.0/com.ibm.websphere.extremescale.doc/txsfailover.html

      Q1: I am confused regarding all the /2 and *2. Should clients advertise 1/2 of their timeout or 1x?
      Q2: How are we (QA) supposed to test ARTEMIS-143 expose AMQP heartbeat functionality?
      Q3: Are we supposed to call it "idle timeout" or "connection TTL?" Linked Artemis documentation uses the latter, yet it seems to me that TTL is usually a different concept measured in "hops" (as in TCP). not in seconds.
      Q4: More terminology. Why does not Artemis doc use the term "heartbeat"? I guess it could be also called a "keep-alive".

      Attachments

        Issue Links

          Activity

            People

              rh-ee-ataylor Andy Taylor
              jdanek@redhat.com Jiri Daněk
              Jiri Daněk Jiri Daněk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: