Details
-
Bug
-
Resolution: Done
-
Major
-
3.1
-
None
Description
We are observing a race condition in TCPGOSSIP based discovery.
A is gossip router and B is a node connecting to gossip router.
Timer-1 on B starts connecting, does ping.
Timer-2 on B starts reconnecting
A sees this new connection and closes the old one from Timer-1 from B.
Timer-2 on B does ping
Timer-1 on B tries to do gossip_get and encounters broken pipe as the socket
is closed.
This process goes on forever.
Logs from Gossip Router A
=========================
2012-12-14 13:20:47,687 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter-
Accepted connection, socket is
Socket[addr=/192.168.1.11,port=21009,localport=12001]
2012-12-14 13:20:47,687 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering
receive loop
2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received
CONNECT(group=TestCluster, addr=dd-ace1-20957,
logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800)
2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer:
/192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for
addr[dd-ace1-20957]. Closing old connection ...
2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] connection handshake completed, added
dd-ace1-20957 to group TestCluster
2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster,
addr=dd-ace1-20957, logical_name=dd-ace1-20957,
physical_addrs=192.168.1.11:7800)
2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] received PING(group=null, addr=null)
2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] processed PING(group=null, addr=null)
2012-12-14 13:20:53,735 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] received CONNECT(group=TestCluster,
addr=dd-ace1-20957, logical_name=dd-ace1-20957,
physical_addrs=192.168.1.11:7800)
2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer:
/192.168.1.11, logical_addrs: dd-ace1-20957]] for addr[dd-ace1-20957]. Closing
old connection ...
2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] is being closed
2012-12-14 13:20:53,838 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957, dd-ace1-20957] connection handshake completed,
added dd-ace1-20957 to group TestCluster
2012-12-14 13:20:53,838 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957, dd-ace1-20957] processed
CONNECT(group=TestCluster, addr=dd-ace1-20957,
logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800)
2012-12-14 13:21:04,023 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter-
Accepted connection, socket is
Socket[addr=/192.168.1.11,port=37357,localport=12001]
2012-12-14 13:21:04,023 DEBUG [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering
receive loop
2012-12-14 13:21:04,085 TRACE [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received
CONNECT(group=TestCluster, addr=dd-ace1-20957,
logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800)
2012-12-14 13:21:04,085 DEBUG [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer:
/192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for
addr[dd-ace1-20957]. Closing old connection ...
2012-12-14 13:21:04,086 DEBUG [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] connection handshake completed, added
dd-ace1-20957 to group TestCluster
2012-12-14 13:21:04,086 TRACE [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster,
addr=dd-ace1-20957, logical_name=dd-ace1-20957,
physical_addrs=192.168.1.11:7800)
Logs from Gossip Client B
=========================
2012-12-14 13:20:47,687 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter-
Accepted connection, socket is
Socket[addr=/192.168.1.11,port=21009,localport=12001]
2012-12-14 13:20:47,687 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering
receive loop
2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received
CONNECT(group=TestCluster, addr=dd-ace1-20957,
logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800)
2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer:
/192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for
addr[dd-ace1-20957]. Closing old connection ...
2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] connection handshake completed, added
dd-ace1-20957 to group TestCluster
2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster,
addr=dd-ace1-20957, logical_name=dd-ace1-20957,
physical_addrs=192.168.1.11:7800)
2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] received PING(group=null, addr=null)
2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] processed PING(group=null, addr=null)
2012-12-14 13:20:53,735 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] received CONNECT(group=TestCluster,
addr=dd-ace1-20957, logical_name=dd-ace1-20957,
physical_addrs=192.168.1.11:7800)
2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer:
/192.168.1.11, logical_addrs: dd-ace1-20957]] for addr[dd-ace1-20957]. Closing
old connection ...
2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] is being closed
2012-12-14 13:20:53,838 DEBUG [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957, dd-ace1-20957] connection handshake completed,
added dd-ace1-20957 to group TestCluster
2012-12-14 13:20:53,838 TRACE [gossip-handlers--9882][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957, dd-ace1-20957] processed
CONNECT(group=TestCluster, addr=dd-ace1-20957,
logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800)
2012-12-14 13:21:04,023 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter-
Accepted connection, socket is
Socket[addr=/192.168.1.11,port=37357,localport=12001]
2012-12-14 13:21:04,023 DEBUG [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering
receive loop
2012-12-14 13:21:04,085 TRACE [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received
CONNECT(group=TestCluster, addr=dd-ace1-20957,
logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800)
2012-12-14 13:21:04,085 DEBUG [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer:
/192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for
addr[dd-ace1-20957]. Closing old connection ...
2012-12-14 13:21:04,086 DEBUG [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] connection handshake completed, added
dd-ace1-20957 to group TestCluster
2012-12-14 13:21:04,086 TRACE [gossip-handlers--9881][]
org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11,
logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster,
addr=dd-ace1-20957, logical_name=dd-ace1-20957,
physical_addrs=192.168.1.11:7800)