A race condition in JGroups could cause a channel that should be closed (for example, after being shunned)
to never be closed.
In order to stop TimeScheduler thread, CloserThread set's TimeScheduler's thread status
as interrupted. If the interruption occurs while TimeScheduler is waiting, then no problems.
But, in TimeScheduler._run(), actual running of a task via task.run(); happens outside
synchronized(queue) block which means that CloserThread could set the TimeSchedule thread's
status as interrupted while the task is running, for example, sending an FD are-you-alive message.
If down the protocol that's carrying out the task, all down threads are set to false, and TimeScheduler
thread is interrupted while the task is running, the interruption could be caught while sending a message to network:
TP (UDP and TCP/TCP_NIO's parent):
TP.down(Event evt)
....
try {
if(use_outgoing_packet_handler)
outgoing_queue.put(msg);
else
send(msg, dest, multicast);
}
catch(QueueClosedException closed_ex) {
}
catch(InterruptedException interruptedEx) {
}
catch(Throwable e) {
if(log.isErrorEnabled()) log.error("failed sending message", e);
}
Catching InterruptedException and doing nothing will clear the Thread's interrupted status. If
the interruption from CloserThread is caught here, TimeScheduler thread will never finished,
leaving the channel blocked and never rejoining the cluster and unable to merge back.
- is related to
-
JGRP-486 UNICAST: messages not retransmitted on load
- Resolved