Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.2.8
Affects Version/s: 2.2.9
Labels:
None

SourceForge Reference:
https://na1.salesforce.com/50030000000braz

SFDC Cases Counter:
SFDC Cases Links:

Description

Physical hosts "A" (192.168.1.1, coordinator) and "B" (192.168.1.2) run JGroups processes configured with TCP/TCPPING stacks.

"A" stack configuration:

TCP(bind_addr=192.168.1.1;start_port=11800;loopback=true):
TCPPING(initial_hosts=192.168.1.2[11800];port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true):
MERGE2(min_interval=5000;max_interval=10000):
FD(shun=true;timeout=1500;max_tries=3;up_thread=true;down_thread=true):
VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false;down_thread=true;up_thread=true)

"B" stack configuration:

TCP(bind_addr=192.168.1.2;start_port=11800;loopback=true):
TCPPING(initial_hosts=192.168.1.1[11800];port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true):
MERGE2(min_interval=5000;max_interval=10000):
FD(shun=true;timeout=1500;max_tries=3;up_thread=true;down_thread=true):
VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false;down_thread=true;up_thread=true)

If I pull the cable under B, the B stack immediately and correctly indentifies A as suspect and installs a new view containing itself only.

However, A does not recognizes B as suspect and undeterministically spews out various info and warning messages. The view (A, B) stays incorrectly "valid" for a long time; sometimes gets replaced by (A), sometimes not.

I tracked down the cause of the problem down to the A TCPPING configuration and TCP queue . If A's TCPPING is configured with a port_range=1, the problem goes away and the new view immediately installs into the A stack. It seems that if there are messages in the TCP queue except the SUSPECT message generated by FD, they mess up things and the SUSPECT message gets stuck in the queue, with undeterministic results.

Attachments

Activity

People

Assignee:: Ovidiu Feodorov (Inactive)

Reporter:: Ovidiu Feodorov (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 2005/02/18 12:15 AM

Updated:: 2006/08/24 11:13 AM

Resolved:: 2005/04/07 9:27 AM