Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2966

NBST: Concurrent leavers can lead to deadlock

XMLWordPrintable

      This sequence of events, leads to a thread deadlock in the coordinator

      1) NodeF sends LEAVE message. new topologyId=8
      2) NodeE delivers REBALANCE_START(8)
      3) NodeF and NodeG delivers REBALANCE_START(8)
      4) NodeH delivers GET_TRANSACTION(8) from NodeE ==> Transactions were requested by node ConcurrentNonOverlappingLeaveTest-NodeE-28744 with topology 8, greater than the local topology (7). Waiting for topology 8 to be installed locally.
      5) NodeH sends LEAVE message. new topologyId=9
      6) NodeH delivers REBALANCE_START(8) ==> Ignoring rebalance 8 for cache dist that doesn't exist locally
      7) NodeH delivers GET_TRANSACTION(8) from NodeG ==> Transactions were requested by node ConcurrentNonOverlappingLeaveTest-NodeG-31669 with topology 8, greater than the local topology (7). Waiting for topology 8 to be installed locally.
      

      Possible solutions are:

      • send the REBALANCE_START/CH_UPDATE async
      • throw an exception when a GET_TRANSACTION is received and the node is shutting down.

        1. trace.log
          474 kB
        2. thread-dump.txt
          128 kB

            dberinde@redhat.com Dan Berindei (Inactive)
            pruivo@redhat.com Pedro Ruivo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: