Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.6.12, 2.8
Affects Version/s: 2.6.9, 2.6.10, 2.6.11
Labels:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When a new node joins a cluster and requests a join-and-state-transfer, coordinator C receives a request and calls a startFlush (CoordGmsImpl.java:400). Call startFlush might fail to due timeout (see FLUSH#startFlush(List<Address> flushParticipants) ). This causes function to return false and a repeat of start flush to be issue by the caller. However, since switch flushInProgress is still true all repeated requests for start flush fail. On the other hand joining node goes into connect loop - it never receives a new view due to a failed flush at coordinator C and it never issues necessary stopFlush that would switch back flushInProgress to false. So coordinator C never comes out of stuck state and client stays in stuck state. The end result is that we have a lockup forever.

Assignee:: Vladimir Blagojevic (Inactive)

Reporter:: Vladimir Blagojevic (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2009/08/17 11:19 AM

Updated:: 2009/08/17 3:00 PM

Resolved:: 2009/08/17 3:00 PM

Details

Description

Attachments

Activity

People

Dates

Hide