Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1326

Gossip Router dropping message for node that is in its routing table list

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Major
    • 3.2
    • 2.10
    • None

    Description

      We are using Tunnel protocol with two Gossip Routers. For some reason we start seeing lots of suspect messages in all the nodes - there are 7 nodes in the group. Six of the nodes (including the coordinator) was suspecting node A (manager_172.27.75.11) and node A was suspecting the coordinator, but no new view was being created. After turning on the trace on both gossip routers (GR1 and GR2) I see following for every message that's sent to Node A (manager_172.27.75.11),

         2011-05-20 15:56:21,186 TRACE [gossip-handlers-6] GossipRouter - cannot find manager_172.27.75.11:4576 in the routing table,
      routing table=
      172.27.75.11_group: probe_172.27.75.13:4576, collector_172.27.75.12:4576, probe_172.27.75.15:4576, manager_172.27.75.11:4576, probe_172.27.75.16:4576, probe_172.27.75.14:4576    
      

      Now, the issue is the routing table does indeed shows that there is "manager_172.27.75.11" - so why is the GR dropping messages for that node. I suspect that somehow the Gossip Router has got some old entry which has not been cleaned up - different UUID with same logical address. I tried going through the GossipRouter.java code, but couldn't find how would this be possible.

      As I understand a node randomly chooses a GR if there are multiple of them for its communication. Each GR would keep a separate list of physical addresses for each node - so is it possible somehow it uses physical address instead of UUID for cleaning/retrieving the node list?

      This seems to be creating big issue and the only work around is to restart the Gossip Routers.

      Attachments

        Activity

          People

            vblagoje Vladimir Blagojevic (Inactive)
            vivash vivek v (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: