Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2778

GMS: synchronize member operations

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 5.2.24, 5.3.5
    • 5.3.4
    • None
    • False
    • None
    • False

      Running a test in jgroups-raft that disconnects two channels in a row, the final is not updated to the remaining members. The test fails in line https://github.com/jgroups-extras/jgroups-raft/blob/d8aedcb3753b404e621983f195023c7c1cae4870/tests/junit-functional/org/jgroups/tests/VoteTest.java#L162, waiting for the view to update on all members.

       

      The log file shows:

      -- shutdown channels
      21:41:04,159 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): D: change leader from null -> null
      21:41:04,159 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.p.GMS): D: sending LEAVE request to A
      21:41:04,159 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: handleMembershipChange([LEAVE(D)])
      21:41:04,160 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: joiners=[], suspected=[], leaving=[D], new view: [A|4] (3) [A, B, C]
      21:41:04,160 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: sending LEAVE response to D
      21:41:04,160 [DEBUG] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: installing view [A|4] (3) [A, B, C] (D left)
      21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.p.GMS): D: got LEAVE response from A in 1 ms
      21:41:04,160 [DEBUG] [jgroups-4,VoteTest,A] (o.j.p.r.ELECTION): A: existing view: [A|3] (4) [A, B, C, D], new view: [A|4] (3) [A, B, C], result: no_change
      21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): D: change leader from null -> null
      21:41:04,160 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: mcasting view [A|4], ref-view=[A|3], left=[D]
      21:41:04,160 [DEBUG] [jgroups-4,VoteTest,B] (o.j.p.p.GMS): B: installing view [A|4] (3) [A, B, C] (D left)
      21:41:04,160 [DEBUG] [jgroups-4,VoteTest,C] (o.j.p.p.GMS): C: installing view [A|4] (3) [A, B, C] (D left)
      21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): C: change leader from null -> null
      21:41:04,160 [DEBUG] [jgroups-4,VoteTest,B] (o.j.p.r.ELECTION): B: existing view: [A|3] (4) [A, B, C, D], new view: [A|4] (3) [A, B, C], result: no_change
      21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.p.GMS): C: last member in the group (coord); leaving now
      21:41:04,160 [DEBUG] [jgroups-4,VoteTest,C] (o.j.p.r.ELECTION): C: existing view: [A|3] (4) [A, B, C, D], new view: [A|4] (3) [A, B, C], result: no_change
      21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): C: change leader from null -> null
      

       

      We see that node C tries to leave and is concurrently updating the view. Few operations need synchronization:

       

      This list is not exhaustive. I haven't read all the uses, so more places might need synchronization.

            rhn-engineering-bban Bela Ban
            rh-ee-jbolina Jose Bolina
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: