Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.6.3, 2.7
Affects Version/s: 2.6.3
Labels:
None

Workaround Description:

Hide

Add sleep after connect and avoid concurrent channel startup

Show
Add sleep after connect and avoid concurrent channel startup

SFDC Cases Counter:
SFDC Cases Links:

Description

We've been having more trouble with concurrent start up and now think
we've isolated a deadlock between FLUSH and GroupRequest during
concurrent startup.

We have four boxes that join a channel and use MessageDispatcher
immediately after connecting. This frequently blocks indefinitely.

GroupRequest.execute() obtains a lock, then a subsequent view change
comes in which does likewise. The upshot is that we can see all
Incoming threads are blocked for the lock and the only way it can be
released is for a stop_flush message to occur. With all incoming
threads blocked, that never happens.

In the attached unit test if you add this after the call to connect("A"), it passes, implying a deadlock;

if (j ==0) {
Thread.sleep(500);
}

Additionally, and this is more speculative, it seems the wait/notify code in pbcast does not account for the spurious wakeup case. I don't know under what circumstances they happen, and I don't believe we're seeing spurious wakes at this time, but it should be fixed at some stage.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ConcurrentStartupWithGroupRequestTest.java
2 kB
2008/05/01 6:37 AM
ConcurrentStartupWithGroupRequestTest.java
2 kB
2008/05/01 6:31 AM
stacktrace.txt
73 kB
2008/05/01 6:33 AM

Activity

People

Assignee:: Bela Ban

Reporter:: Robert Newson (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 2008/05/01 6:29 AM

Updated:: 2008/05/22 10:01 AM

Resolved:: 2008/05/22 10:01 AM