Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-6350

Data race in the ShardIndexManager under topology changes

    XMLWordPrintable

Details

    Description

      The following example data race can cause unrecoverable errors during indexing:

      [node1] cache.put(key) // key maps to segment 48, owned by node1
      [node1] starts shard 48
      [node1] acquires lock on shard 48
      [node1] starts writing to the index
      [node1] notification of topology changed, lock released on shard 48
      [node1] lock reacquired (still writing to the index)
      [node1] commit on shard 48
      [node1] shard still locked
      [node2] cache.put(key) // Node2 now owns segment 48
      [node2] starts shard 48
      [node2] tries to acquire the lock on shard 48
      [node2] fail (lock still owned by node1)

      The current mechanism employed by the ShardIndexManager during topology changes involves using a listener and closing the IndexWriter on all nodes upon ownership changes, so that the lock is released and can be reacquired by the new owner (1 segment maps to 1 shard).
      Since writing to a shard can take some time, the listener can be triggered in the middle of an index operation and the closing of the index writer will have a very short duration because it is sudden reacquired, and not released anymore.

      Attachments

        Issue Links

          Activity

            People

              gfernand@redhat.com Gustavo Fernandes (Inactive)
              gfernand@redhat.com Gustavo Fernandes (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: