Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1167

Invalid MERGE_VIEW causes subgroups isolation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 2.10
    • 2.9
    • None

    Description

      We are using WANem to test latest JGroups 2.9 branch. In one of our test we noticed the following behavior,

      1) Three nodes: Manager, Collector and Probe
      2) Manager has the Gossip Router running
      3) There is a WANem between Manager and Collector. The WANem rule randomly makes host/network unreachable for 30-45 sec every 3-5 minutes.
      4) Before the disconnect we had view V1

      {manager, collector, probe}

      on all the nodes
      5) After disconnect here are the view changes,
      a) On Manager - V2

      {manager, probe}

      b) On Collector - V3

      {collector, probe}
      c) On Probe - V3{collector, probe}
      • note they all have the same view ids - 3
        6) After 20 sec we get a MERGE_VIEW on Manager,
        a) On Manager - V4 {probe, manager}

        - but, there is only one subgroup in it (V3)
        7) Now even after connection is established between Manager and Database (WANen rules deleted), we don't get new merge view on nodes. All nodes stay with whatever they had. All of them throw NAKACK error (see attached log)

      Now, the problem seems to be view (V2) has Probe as its coordinator and view (V3) has Collector as its coordinator. Probe itself has V3 - so Collector asks Probe what view you got and it says it has the same view as Collector - so no merge is sent out to Manager. So, basically we got a broken group /w isolated subgroups.

      Questions
      --------
      1) How is it possible to get a MERGE_VIEW /w only one subgroup in it?
      2) How can coordinator of a view (on Node A) have a different view itself (on Node B - say Node B is coordinator)?
      3) What's causing this node isolation - what protocol (GMS, MERGE2)?

      Attached is the log showing the view changes on each node and our protocol stack.

      Attachments

        1. nodeIsolation_2.txt
          3 kB
          vivek v
        2. udpgossip-stack.xml
          3 kB
          vivek v

        Activity

          People

            vblagoje Vladimir Blagojevic (Inactive)
            vivash vivek v (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: