Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-13160

Write may block topology update forever

XMLWordPrintable

      ISPN-10753 changed StateTransferLockImpl to use a StampLock, which is not reentrant.

      Acquiring the read lock twice from the same thread is possible, because the read lock is not exclusive but it brings a deadlock risk:

      1. thread 1 acquires the read lock in EntryWrappingInterceptor.applyChanges()
      2. thread 2 tries to acquire the write lock in StateConsumerImpl.onTopologyUpdate() and blocks
      3. thread 1 tries to acquire the read lock a second time in ClusteringDependentLogic.DistributionLogic.commitSingleEntry() and also blocks

      This actually happens in the test suite, causing random failures in NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut():

      11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [StateConsumerImpl] Received new topology for cache defaultcache, isRebalance = false, isMember = true, topology = CacheTopology{id=8, phase=READ_NEW_WRITE_ALL, rebalanceId=3, currentCH=DefaultConsistentHash{ns=1, owners = (2)[Test-NodeA: 1+0, Test-NodeB: 0+1]}, pendingCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+0, Test-NodeC: 1+0]}, unionCH=DefaultConsistentHash{ns=1, owners = (3)[Test-NodeA: 0+1, Test-NodeB: 0+1, Test-NodeC: 1+0]}, actualMembers=[Test-NodeA, Test-NodeB, Test-NodeC], persistentUUIDs=[a861c235-ccf1-4f01-857e-5810a9bbced0, 48f2b81e-7f18-456c-ab36-b587e8a3a235, a57502c1-343f-47f8-86ed-d87a6cda39b0]}
      ### This message is logged between the 2 read locks
      11:35:10,740 TRACE (jgroups-8,Test-NodeA:[]) [EntryWrappingInterceptor] About to commit entry ReadCommittedEntry(f3a7a72){key=testkey, value=v1, oldValue=null, isCreated=true, isChanged=true, isRemoved=false, isExpired=false, isCommited=false, skipLookup=false, metadata=EmbeddedExpirableMetadata{version=null, lifespan=-1, maxIdle=-1}, oldMetadata=null, internalMetadata=null}
      11:35:10,740 TRACE (non-blocking-thread-Test-NodeA-p39961-t1:[]) [DefaultSegmentedDataContainer] Ensuring segments {0} are started
      11:36:10,770 ERROR (testng-Test:[]) [TestingUtil] Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]!
      11:36:10,771 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut
      java.lang.RuntimeException: Cache defaultcache timed out waiting for rebalancing to complete on node Test-NodeA, expected member list is [Test-NodeA, Test-NodeB, Test-NodeC], current member list is [Test-NodeA, Test-NodeB]!
      	at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:452) ~[test-classes/:?]
      	at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.doTest(NonTxPrimaryOwnerBecomingNonOwnerTest.java:168) ~[test-classes/:?]
      	at org.infinispan.distribution.rehash.NonTxPrimaryOwnerBecomingNonOwnerTest.testPrimaryOwnerChangingDuringPut(NonTxPrimaryOwnerBecomingNonOwnerTest.java:68) ~[test-classes/:?]
      

      The read lock in EntryWrappingInterceptor.applyChanges() seems to be obsolete: there are other code paths in EntryWrappingInterceptor that also commit entries, but do not acquire the state transfer read lock. Removing this read lock acquisition should fix the deadlock.

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: