Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-11017

Cluster fails and doesn't recover under load

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Critical
    • None
    • 10.0.1.Final
    • Server
    • None
      • Start fresh Infinispan cluster with 12 replicas
      • Start load test
      • Wait for a few seconds

    Description

      After running the load test for a few seconds the inifinispan cluster stops accepting requests and the nodes start to split off from the cluster. In the server's log you can find tons of exceptions like:

      10:42:26,939 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache '___protobuf_metadata', writing keys [deviceRegistry.proto]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key deviceRegistry.proto and requestor GlobalTx:infinispan-2-61958:249. Lock is held by GlobalTx:infinispan-2-61958:248
      

      Stopping the load test doesn't let the cluster recover. Most (not all) of the liveness checks fail and pods get restarted. But even after 1 hour, the cluster is still in a non-working state.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jreimann-2 Jens Reimann
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: