JGroups
  1. JGroups
  2. JGRP-1570

STABLE: desired_avg_gossip leads to long intervals between reception of STABILITY messages in large clusters

    Details

    • Type: Feature Request Feature Request
    • Status: Resolved Resolved (View Workflow)
    • Priority: Major Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 3.3
    • Labels:
      None
    • Similar Issues:
      Show 10 results 

      Description

      The time computed for the sending of STABLE is desired_avg_gossip * cluster-size *2. While this is OK for small clusters, it may be too big for large clusters.
      On the other hand, if every member simply multicasts a STABLE message every (say) 30 seconds on average, then the number of messages sent grows with increasing cluster size.
      Investigate a way to set a lower and upper limit for the making and delivery of STABILITY messages, e.g. the goal is to receive 1 stability message every 60s.

      Besides increased traffic, however, this requires everyone to have a TCP connection to everybody else in the cluster in case of a TCP transport.

      A better solution might be to have only a dedicated member (the coord) periodically multicast a STABLE message. Everyone replies with a (unicast) STABLE message and when the coord has received STABLE replies from everyone, it multicasts a STABILITY message. This would only require a multicast from the coord to everyone, establishing TCP connections from the coord to everyone (usually already exists because of the VIEW-CHANGE multicast), but everyone would reuse the same TCP connection to send the reply.

        Issue Links

          Activity

          Hide
          Bela Ban
          added a comment -

          Or possibly have every node periodically send a STABLE message to the coordinator, and the coord multicasts a STABILITY message when a STABLE msg from everyone has been received. This would eliminate the need for the first multicast by the coord. The cost would be N unicasts every desired_avg_gossip seconds and an ensuing multicast by the coord.

          Show
          Bela Ban
          added a comment - Or possibly have every node periodically send a STABLE message to the coordinator, and the coord multicasts a STABILITY message when a STABLE msg from everyone has been received. This would eliminate the need for the first multicast by the coord. The cost would be N unicasts every desired_avg_gossip seconds and an ensuing multicast by the coord.
          Hide
          Bela Ban
          added a comment -

          STABLE.max_bytes is also scaled with the cluster size; this may have to be changed, too. So if max_bytes is 400K, with 2 members it is 800K, with 10 members it is 4MB: this does change the time to send a STABILITY message, and to be in line with what was suggested above, we probably need to remove the scaling, too.

          Show
          Bela Ban
          added a comment - STABLE.max_bytes is also scaled with the cluster size; this may have to be changed, too. So if max_bytes is 400K, with 2 members it is 800K, with 10 members it is 4MB: this does change the time to send a STABILITY message, and to be in line with what was suggested above, we probably need to remove the scaling, too.
          Hide
          Bela Ban
          added a comment -

          The previous 2 comments apply to STABLE2 (JGRP-1595). This current JIRA issue is only about changing the defaults (desired_avg_gossip and max_bytes), by not scaling them with the cluster size.
          This is a simple change, but we need to investigate how this impacts:

          • STABLE / STABILITY messages required for a stable round
          • Performance
          Show
          Bela Ban
          added a comment - The previous 2 comments apply to STABLE2 ( JGRP-1595 ). This current JIRA issue is only about changing the defaults (desired_avg_gossip and max_bytes), by not scaling them with the cluster size. This is a simple change, but we need to investigate how this impacts: STABLE / STABILITY messages required for a stable round Performance
          Hide
          Bela Ban
          added a comment -

          Here are some numbers for 4 nodes and MPerf having every node send 1 million 1K messages.
          Old = existing, new = JGRP-1570 implemented (no scaling)

          - STABLE sent STABLE received STABILITY sent STABILITY received
          A (old) 96 236 11 64
          A (new) 436 484 27 130
          B (old) 97 236 15 64
          B (new) 434 484 32 130
          C (old) 95 236 17 64
          C (new) 437 484 34 130
          D (old) 96 235 21 64
          D (new) 431 484 37 130

          The numbers show that every node send way more STABLE messages without scaling, and receives about twice as many. Also, twice as many STABILITY message are sent and received.
          However, this is not necessarily a bad thing, as more STABILITY messages mean a quicker purging of messages in the sender's (and possibly receiver's) caches, leading to better memory use.
          Performance was about the same between old and new.

          Show
          Bela Ban
          added a comment - Here are some numbers for 4 nodes and MPerf having every node send 1 million 1K messages. Old = existing, new = JGRP-1570 implemented (no scaling) - STABLE sent STABLE received STABILITY sent STABILITY received A (old) 96 236 11 64 A (new) 436 484 27 130 B (old) 97 236 15 64 B (new) 434 484 32 130 C (old) 95 236 17 64 C (new) 437 484 34 130 D (old) 96 235 21 64 D (new) 431 484 37 130 The numbers show that every node send way more STABLE messages without scaling, and receives about twice as many. Also, twice as many STABILITY message are sent and received. However, this is not necessarily a bad thing, as more STABILITY messages mean a quicker purging of messages in the sender's (and possibly receiver's) caches, leading to better memory use. Performance was about the same between old and new.

            People

            • Assignee:
              Bela Ban
              Reporter:
              Bela Ban
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: