Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1389

Synchronous messages

    XMLWordPrintable

Details

    • Feature Request
    • Resolution: Done
    • Major
    • 3.1
    • None
    • None
    • 0
    • 0% 0%

    Description

      FLUSH ensures that every member has received all messages from all other members. This is quite costly, especially if we have a large cluster.

      However, sometimes, it is only necessary to ensure that a given message M sent by P has been received by everyone, for example, if P is the coordinator and installs view V, then it would be sufficient to ensure that everyone received V.

      So synchronous messages are messages that block the sender until all of the (non-faulty) recipients have ack'ed their reception, or a timeout occurs.

      Note that P when sending messages 5,6,7,8, and only tagging 8 as synchronous, when returning from sending 8, JGroups guarantees that everyone will also have received all messages from P lower than 8.

      A user should be able to configure whether the message send is complete when all receipients have received the message, or when they have delivered it.

      Reception of a message means that the message was added to the receipient's buffer, delivery means that the message was consumed by the application.

      A message M4 sent by P is sometimes delivered late because M is the last message sent by P, and if members Q and R don't receive M4 (e.g. because it was dropped), and P doesn't send another message M5, then Q and R have no means of detecting that M4 was sent by P and thus ask P for retransmission of M4.
      This is of course solved by STABLE which periodically broadcasts the highest seqnos sent, and then Q and R can ask P for retransmission of M4.

      However, if we don't want to wait that long, and don't want to risk Q and R to leave before they've received M4, we can implement a sender-flush of M4 sent by P.

      This works as follows:

      • P can add a RSVP flag to M4
      • P adds M4 to a retransmit table and keeps retransmitting M4 until it has received ACKs from every non-faulty member in the target set, or P leaves
      • When a message is received that is tagged with RSVP, an ACK is sent back to the sender
      • When P has received ACKs from everyone, the message send returns. The caller is blocked until this is the case (maybe bound with a timeout ?)

      The reason why this works is that every receiver gets the latest message M4 from P, and adds it to its retransmit table. If there is a gap, it will ask P for retransmission of missing messages. This way, a receiver won't have to wait until STABLE sends a digest to find out it's missing messages from P.

      There could be an option for a receiver R to delay sending an ACK back to P until it has actually received P's missing messages. If this isn't the case, P could leave or crash before R got all of P's missing messages.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: