Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1570

STABLE: desired_avg_gossip leads to long intervals between reception of STABILITY messages in large clusters

    Details

    • Type: Feature Request
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 3.3
    • Labels:
      None

      Description

      The time computed for the sending of STABLE is desired_avg_gossip * cluster-size *2. While this is OK for small clusters, it may be too big for large clusters.
      On the other hand, if every member simply multicasts a STABLE message every (say) 30 seconds on average, then the number of messages sent grows with increasing cluster size.
      Investigate a way to set a lower and upper limit for the making and delivery of STABILITY messages, e.g. the goal is to receive 1 stability message every 60s.

      Besides increased traffic, however, this requires everyone to have a TCP connection to everybody else in the cluster in case of a TCP transport.

      A better solution might be to have only a dedicated member (the coord) periodically multicast a STABLE message. Everyone replies with a (unicast) STABLE message and when the coord has received STABLE replies from everyone, it multicasts a STABILITY message. This would only require a multicast from the coord to everyone, establishing TCP connections from the coord to everyone (usually already exists because of the VIEW-CHANGE multicast), but everyone would reuse the same TCP connection to send the reply.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  belaban Bela Ban
                  Reporter:
                  belaban Bela Ban
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: