Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1164

TCPGOSSIP doesn't maintain the Gossip Router state correctly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.8.1, 2.10
    • 2.8, 2.9
    • None

      This came out of discussion /w Vladimir on jira# 1162 (https://jira.jboss.org/jira/browse/JGRP-1162). Basically, the problem is currently TCPGossip.connect(..) calls RouterStubs.connect(..). The connect at router stub simply makes the socket connection /w Gossip Router and then sends the "Connect" command without waiting for the response from GR. The socket creation creates the Connection Handler on the Gossip Router. The problem is in a very lossy network we might lose the "Connect" command and thus GR will not add the node (RouterStub node) in its list. Since, the socket connection is established (and won't disconnect due keep alive even in lossy network), the router stub would assume it is connected (would set to CONNECTED if the socket is good), but will not get itself from GR when asking for nodes in the group.

      Theoretically in TCP connection we shouldn't lose the packets, but in our experiments we did. I tried the test with WANem setting 50% packet loss and was able to create the situation where the socket was created, but no CONNECT. This caused Gossip Router to publish wrong node list.

      Two possibles fixes,
      1) When we make the socket connection assume CONNECTED - so we don't have to send a separate CONNECT command.

      • This is not a good design as the STATE should be kept by the application layer
        2) Wait for the response from Gossip Router for CONNECT (same for DISCONNECT as well) before setting TCPGOSSIP to CONNECTED state

            vblagoje Vladimir Blagojevic (Inactive)
            vivash vivek v (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: