Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-3455

Cache replication not warranted under load

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • None
    • 5.3.0.Final, 6.0.0.Final
    • Core
    • None

      Problem:

      When running a replicated cache and repeatedly calling a cacheable method (using Spring cache abstraction), Infinispan enters an infinite replication loop. This can be confirmed by observing replication counts growing over time, where there are no cache misses.

      Expected behavior:

      Caches shouldn't be replicated when there is a cache hit.

      Test case:

      • 3 cluster members; asynchronous replication with a replication queue
      • a cacheable method is executed repeatedly using 2 different keys

      Notes:

      • for some reason, this issue only occurs when using Enum arguments for a cache key; I was not able to replicate this when using int or String types (see com.designamus.infinispan.Main.works())
      • the behavior is not deterministic (random), which points to a race condition
      • the problem does not seem to be related to the Spring's default cache key generator; I was able to reproduce the same behavior with a custom cache key generator, which was thread-safe
      • the cacheable method is executed only twice (once both keys are stored in the cache); subsequent invocations retrieve stored values from the cache; this can be confirmed by inspecting the log file
      • the cache doesn't expire and entries are not evicted
      • the memory usage grows over time, eventually causing OOM on a heavily loaded system
      • since the issue is random in nature it may take a 3-4 attempts to reproduce it; I was successful in reproducing this behavior numerous times

      Steps to reproduce:

      1. Build a test project (mvn clean compile)

      2. Execute /run.sh (this will spawn 3 JVMs)

      3. Start JConsole to monitor 3 cluster members (jconsole localhost:17001 localhost:17002 localhost:17003)

      4. Monitor "replicationCount" attribute under RpcManager for cache "MyCache" for all JVMs (see /replication-counts.png)

      5. Observe that replication counts grow over time

      6. Observe that all caches are of size 2 and there are no cache misses (see /cache-statistics.png)

      If the issue cannot be reproduced (replication counts stay at the same level):

      5. Terminate all 3 JVM processes (as a convenience you can execute /stop.sh)

      6. Repeat steps 2 through 5 above

      When testing the above scenario using a distributed mode, I observed some other anomalies (i.e. the cacheable method was executed multiple times, as if the value was not there). While this may be related, it deserves a separate JIRA.

            [ISPN-3455] Cache replication not warranted under load

            closing this as it it is not an Infinispan issue.

            Mircea Markus (Inactive) added a comment - closing this as it it is not an Infinispan issue.

            Giovanni, you are right - not Infinispan but inconsistent hash codes in enum constants are causing this behavior. I ended up creating a wrapper class that calculates hash code using the underlying enum's class name and the enum name. This issue affects all classes that inherit hashCode() from Object.

            Thanks a lot for your help in resolving this!
            Lukasz

            Lukasz Szelag (Inactive) added a comment - Giovanni, you are right - not Infinispan but inconsistent hash codes in enum constants are causing this behavior. I ended up creating a wrapper class that calculates hash code using the underlying enum's class name and the enum name. This issue affects all classes that inherit hashCode() from Object. Thanks a lot for your help in resolving this! Lukasz

            Lukasz, i have just spent a week debugging the same issue. The issue is not due to Infinispan but to the usage of the Enum as keys. Infinispan to work need to be able for the key to calculate an hashcode that is consistent in the cluster, meaning if the key is Key1 then on node1,node2,node3 the hashcode produced need to be the same. When the key is an enum, that is not possible because the Java Enum generates an hashcode that is consistent only within one node, but when transported across JVM that is not guaranteed to be generated the same. You could think then to override the hashcode for the enum to generate a consistent one, sorry not possible, look at this SUN bug report: http://bugs.sun.com/view_bug.do?bug_id=6373406
            In my case i end up creating a wrapper around the enum key and forcing for that wrapper a generation of a consistent hashkey.
            For code reference where the problem start in infinispan look at:
            https://github.com/infinispan/infinispan/blob/5.3.x/core/src/main/java/org/infinispan/interceptors/distribution/BaseDistributionInterceptor.java
            lines 127-130, there is were the troubles start.
            Bottom line as i said this is an issue caused by the Java ENUM design.
            For reference to a bugfix i have done to fix this issue in my code look at:
            https://git.opendaylight.org/gerrit/#/c/1292/

            Hope it helps.
            Giovanni

            Giovanni Meo (Inactive) added a comment - Lukasz, i have just spent a week debugging the same issue. The issue is not due to Infinispan but to the usage of the Enum as keys. Infinispan to work need to be able for the key to calculate an hashcode that is consistent in the cluster, meaning if the key is Key1 then on node1,node2,node3 the hashcode produced need to be the same. When the key is an enum, that is not possible because the Java Enum generates an hashcode that is consistent only within one node, but when transported across JVM that is not guaranteed to be generated the same. You could think then to override the hashcode for the enum to generate a consistent one, sorry not possible, look at this SUN bug report: http://bugs.sun.com/view_bug.do?bug_id=6373406 In my case i end up creating a wrapper around the enum key and forcing for that wrapper a generation of a consistent hashkey. For code reference where the problem start in infinispan look at: https://github.com/infinispan/infinispan/blob/5.3.x/core/src/main/java/org/infinispan/interceptors/distribution/BaseDistributionInterceptor.java lines 127-130, there is were the troubles start. Bottom line as i said this is an issue caused by the Java ENUM design. For reference to a bugfix i have done to fix this issue in my code look at: https://git.opendaylight.org/gerrit/#/c/1292/ Hope it helps. Giovanni

            I was not able to replicate this issue using 5.2.7.Final version. It appears that this bug was introduced in 5.3.0.Final.

            Lukasz Szelag (Inactive) added a comment - I was not able to replicate this issue using 5.2.7.Final version. It appears that this bug was introduced in 5.3.0.Final.

            This issue is also present in the latest version (6.0.0.Alpha3).

            Lukasz Szelag (Inactive) added a comment - This issue is also present in the latest version (6.0.0.Alpha3).

            In our system (6 clustered nodes), the replication counts continue to grow rapidly even after there is no caching activity at all (initially, the caches are being populated as data is being processed). For example, there are only 18 entries in the cache, whereas the replication count is close to 1 million. This eventually causes system going out of memory with a 20 GB heap.

            Lukasz Szelag (Inactive) added a comment - In our system (6 clustered nodes), the replication counts continue to grow rapidly even after there is no caching activity at all (initially, the caches are being populated as data is being processed). For example, there are only 18 entries in the cache, whereas the replication count is close to 1 million. This eventually causes system going out of memory with a 20 GB heap.

            A test project to reproduce the issue.

            Lukasz Szelag (Inactive) added a comment - A test project to reproduce the issue.

              mircea.markus Mircea Markus (Inactive)
              lukasz74nj Lukasz Szelag (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: