Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1658

GMS: Node re-joining the cluster during shutdown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • 3.4
    • 3.3.1
    • None

      We have RSVP in the stack, with ack_on_delivery=true.

      It seems that node C receives a RSVP-flagged message just after it sent the LEAVE_REQ to A, and immediately after sending the RSVP ACK it sends a JOIN_REQ as well.

      11:55:54,524 DEBUG (testng:) [DefaultCacheManager] Stopping cache manager ISPN on C
      11:55:54,525 DEBUG (testng:) [GMS] C: sending LEAVE request to A
      11:55:54,525 TRACE (testng:) [TCP] C: sending msg to A, src=C, headers are GMS: GmsHeader[LEAVE_REQ]: mbr=C, UNICAST2: DATA, seqno=16, TCP: [channel_name=ISPN]
      11:55:54,526 TRACE (ViewHandler,A:) [GMS] A: new members=[], suspected=[], leaving=[C], new view: [A|4] [A, B, D]
      11:55:54,528 TRACE (OOB-3,C:) [TCP] C: received [dst: <null>, src: A (4 headers), size=7469 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=93579, rsp_expected=false, exclusion_list=[A], RSVP: REQ(7), NAKACK2: [MSG, seqno=9], TCP: [channel_name=ISPN]
      11:55:54,528 TRACE (OOB-3,C:) [TCP] C: sending msg to A, src=C, headers are RSVP: RSP(7), UNICAST2: DATA, seqno=17, TCP: [channel_name=ISPN]
      11:55:54,529 TRACE (OOB-3,C:) [TCP] C: sending msg to A, src=C, headers are GMS: GmsHeader[JOIN_REQ]: mbr=C, UNICAST2: DATA, seqno=1, first, TCP: [channel_name=ISPN]
      11:55:54,613 TRACE (ViewHandler,A:) [GMS] A: new members=[C], suspected=[], leaving=[], new view: [A|5] [A, B, D, C]
      11:55:54,613 TRACE (ViewHandler,A:) [GMS] A: mcasting view [A|5] [A, B, D, C] (4 mbrs)
      11:55:54,841 DEBUG (testng:) [TEST_PING] Stop discovery for C
      11:55:54,841 DEBUG (testng:) [TCP] closing sockets and stopping threads
      11:55:55,683 TRACE (Timer-5,A:) [TCP] A: sending msg to C, src=A, headers are GMS: GmsHeader[JOIN_RSP]: join_rsp=view: [A|5] [A, B, D, C], digest: B: [0 (0)], D: [0 (0)], A: [11 (11)], C: [0 (0)], UNICAST2: DATA, seqno=1, conn_id=4, first, TCP: [channel_name=ISPN]
      

      A adds C back to the view, but C shuts down and will never receive the JOIN_RSP message. Instead, the remaining members keep logging this error message until they are shut down 3 minutes later:

      11:59:01,346 TRACE (TransferQueueBundler,D:) [TCP] 127.0.0.1:8003: failed connecting to 127.0.0.1:8002: java.net.ConnectException: Connection refused
      

            rhn-engineering-bban Bela Ban
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: