Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-8110

OffHeapBoundedDataContainer.ensureSize() busy loop takes too long

    XMLWordPrintable

Details

    • Task
    • Resolution: Done
    • Major
    • 9.1.1.Final
    • 9.1.0.Final
    • Core
    • None

    Description

      OffHeapBoundedDataContainer.ensureSize() needs to acquire the lruLock in order to check the container size, and then it needs to acquire the write lock of the entry pointed at by the LRU list head (firstAddress) in order to remove it.

      Currently it tries to acquire the entry write lock with locks.getLockFromHashCode(hashCode).writeLock().tryLock() while holding the lruLock. But the entry's read-write lock is actually shared by a bucket of entries, so another thread reading a completely different entry fail the tryLock and will force ensureSize() to unlock lruLock and try again.

      Even worse, a regular put will acquire the entry/bucket's write lock first, and then try to acquire the lruLock while holding the bucket lock. Because lruLock isn't fair, it's possible for 2 or more ensureSize threads to block a writer thread for a long time.

      OffHeapBoundedSingleNodeTest.testMultiThreaded shows a different pathological scenario, one that I don't have an explanation for yet. At least with trace logging enabled, the ensureSize() loop can repeat for 100+ milliseconds, but without any other thread logging anything:

      15:52:20,012 TRACE (ForkThread-3,Test:[]) [BoundedOffHeapDataContainer] Removing first LRU node at 140339935456768
      15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved long value 140339734124528 from address 140339935456784
      15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Wrote long value 0 to address 140339734124536
      15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating off heap memory at 140339935456768 with 28 bytes.  Total size: 8727
      15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140340069670800 to [B@38d5b85f offset by 16 with a total of 17 bytes
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating off heap memory at 140340069670784 with 57 bytes.  Total size: 8670
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 140339868355200 from address 140339734124528
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing entry: 140339868355200 due to eviction due to size 102 being larger than maximum of 100
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355216 to [B@5e23dca1 offset by 16 with a total of 17 bytes
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355233 to [B@50382f4a offset by 16 with a total of 11 bytes
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 0 from address 140339868355208
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355216 to [B@3b554f41 offset by 16 with a total of 17 bytes
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355233 to [B@472da290 offset by 16 with a total of 11 bytes
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355244 to [B@7aba800 offset by 16 with a total of 0 bytes
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355244 to [B@167885c0 offset by 16 with a total of 13 bytes
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 140339734124528 from address 140339868355200
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      ...
      15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
      15:52:20,147 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing first LRU node at 140339734124528
      

      Off-topic, it's a bit weird that UnsafeWrapper logs every address being read from/written to, but the data container doesn't log the keys and values that it reads/writes...

      I propose (after discussing this with Will) that ensureSize() should acquire the bucket lock after releasing the lruLock, so it can use lock() instead of tryLock().

      Attachments

        Issue Links

          Activity

            People

              wburns@redhat.com Will Burns
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: