Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4777

Replace command not atomic in REPL_SYNC cache mode

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Major
    • None
    • 5.2.6.Final
    • None
    • None

    Description

      This problem was discovered using the Lucene InfinispanDirectory with DistributedSegmentReadLocker. We found after a while of production usage that some Lucene files were randomly removed from the caches, but remained in the file listing entry, which resulted in an unusable index.

      We managed to replicate the problem in a test that acquires and releases read lock concurrently and checks for file deletion. We found this fails quickly when using REPL_SYNC mode, but runs for a while with DIST_SYNC.

      Some extra logging indicated that the replace command used to increment the lock counter across multiple cluster members, results in an single increment when called concurrently, with both calls reporting success. This eventually causes the file deletion, as we have now mis-counted the number of readers. We also observed the opposite effect of the counter only decrementing by one when releasing.

      Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.

      Attachments

        Activity

          People

            gfernand@redhat.com Gustavo Fernandes (Inactive)
            chp-anujs Anuj Shah (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: