Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2872

CH_UPDATE from new coord may crash rebalance from old coord

    XMLWordPrintable

    Details

      Description

      This happened probably the first time, but the issue is here:

      When old coordinator leaves the cluster, it sends a REBALANCE_START as a goodbye. This will trigger rebalance process on some of the nodes. As we do sync GET_TRANSACTIONS, processing this command may take a while.

      However, new coordinator will send CH_UPDATE, which will change the current topologyId to a higher id. This command is processed in LocalTopologyManagerImpl synchronized on cacheStatus, but rebalance command has already left its synchronized block when it executes handler.rebalance.

      Then, as the old REBALANCE_START tries to call notifyTransactionDataReceived in its finally block, it finds out that the topologyId has increased and throws an exception. But the rebalance is left in inconsistent state (activeTopologyUpdates are non-zero, potentionally waitForState true, DataRehash listener notification not called...).

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  dan.berindei Dan Berindei
                  Reporter:
                  rvansa Radim Vansa
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: