Details
-
Bug
-
Resolution: Done
-
Major
-
3.1
-
None
Description
I've just seen a deadlock in SEQUENCER, which I think occurs as follows:
- Start of day. coord=null, ack_mode=true, flushing=false.
- First view arrives. We call handleView(), and start a Flusher
- On the Flusher thread, flush() gets as far as waiting to acquire the send_lock, but doesn't yet have it.
- Meanwhile on Thread 2, the application tries broadcasting a message. This gets as far as the trace "forwarding my-address::1 to coord null", but does not yet enter forwardToCoord().
- Now on Thread 3, a second view arrives. handleViewChange() finds that coord_changed is true, and calls stopFlusher(). This sets flushing=false.
- Now Thread 2 picks up. flushing=false, ack_mode=true; so forwardToCoord() gets as far as acquiring the send_lock.
- Now Thread 2 loops around making no progress. forward() always drops the message, because coord is null. The send_lock is never relinquished
- So the Flusher thread can never acquire the send_lock, and the Flusher can't exit
- And so Thread 3 is stuck too, in stopFlusher().