Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: 2.6.11, 2.8
Affects Version/s: 2.6.3, 2.6.4, 2.6.5, 2.7
Labels:
None

SourceForge Reference:
http://sourceforge.net/forum/message.php?msg_id=7404809

SFDC Cases Counter:
SFDC Cases Links:

Description

I am experiencing a problem with jgroups trying to join existing cluster.

Occasionally, new node joining a existing cluster can experience this problem.

2009-05-21 12:04:02,568 [main] WARN org.jgroups.protocols.pbcast.GMS:144 - join(callisto.tmca.com.au-18715) sent to callisto.tmca.com.au-8185 timed out (after 3000 ms), retrying

Retries can varies from a couple of times to infinitely retrying.

Debugging the code, I've discovered that before join the coordinator will perform a GMS flush and unless that GMS flush success it won't reply with a join response.

So sure enough at the coordinator, I see this log.
2009-05-21 12:05:25,902 [ViewHandler,callisto.tmca.com.au-8185] WARN org.jgroups.protocols.pbcast.GMS:749 - GMS flush by coordinator at callisto.tmca.com.au-8185 failed

I've originally come across this problem in our prod environment with 2.6.3. I have been able to replicate it reliably with 2.6.3. I have tested with 2.7.0 and 2.8.0.alpha3 and retries still occurs but generally it would sort itself out within a minute. However, I've found that retries can still occurs infinitely on 2.8.0 if you keep repeating the test often enough.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

jgroup.tar.gz
4 kB
2009/05/24 8:49 PM

Issue Links

relates to

JGRP-1007 Flush: change signature of JChannel#startFlush to include checked exception

Resolved

Activity

People

Assignee:: Vladimir Blagojevic (Inactive)

Reporter:: Ronn C (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2009/05/24 8:48 PM

Updated:: 2009/10/14 10:53 PM

Resolved:: 2009/07/02 8:09 AM