Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-25179

[CLUSTERING]: Shutdown causes "Lost data because of abrupt leavers"

XMLWordPrintable

      Scenario: we have a four nodes cluster and we trigger fail-over by failing the nodes one by one in a sequence (EAP Shut-down);

      We noticed the following FATAL error:

      ISPN000313: Lost data because of abrupt leavers
      

      We noticed the error in a couple of circumstances:

      When node 2 s failed

      10:16:32 ============================================================
      10:16:32  Shutting down service WildFly Service 2 on node 10.0.102.100 with a 15 seconds suspend timeout
      10:16:32 ============================================================
      10:16:32 
      10:16:32 08:16:32.912 [main] DEBUG o.j.e.c.i.i.u.WildFlyServiceInstrumentationProviderSunstone - service WildFly Service 2 has been suspend and shut down with a 15 seconds timeout
      

      On node 3:

      2023-07-12 08:16:33,495 INFO  [org.infinispan.LIFECYCLE] (non-blocking-thread--p4-t1) [Context=org.infinispan.CONFIG] ISPN100010: Finished rebalance with members [wildfly3, wildfly4, wildfly1], topology id 28
      2023-07-12 08:16:33,496 FATAL [org.infinispan.CLUSTER] (non-blocking-thread--p7-t4) [Context=clusterbench-ee10.ear.clusterbench-ee10-web-granular.war] ISPN000313: Lost data because of abrupt leavers [wildfly1, wildfly2]
      2023-07-12 08:16:33,496 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p7-t4) [Context=clusterbench-ee10.ear.clusterbench-ee10-web-granular.war] ISPN100007: After merge (or coordinator change), recovered members [wildfly3, wildfly4] with topology id 20
      

      Note that wildfly1 is not leaving the cluster!!!!

      At the end of the test when all nodes are shut down in sequence

      Node 1 is shut down:

      10:38:00 ============================================================
      10:38:00  Shutting down service WildFly Service 1 on node 10.0.99.147 with a 15 seconds suspend timeout
      10:38:00 ============================================================
      10:38:00 
      10:38:00 08:38:00.716 [main] DEBUG o.j.e.c.i.i.u.WildFlyServiceInstrumentationProviderSunstone - service WildFly Service 1 has been suspend and shut down with a 15 seconds timeout
      

      On node 2:

      2023-07-12 08:38:01,549 FATAL [org.infinispan.CLUSTER] (non-blocking-thread--p9-t4) [Context=clusterbench-ee10-2.ear.clusterbench-ee10-web.war] ISPN000313: Lost data because of abrupt leavers [wildfly4, wildfly1]
      

            pferraro@redhat.com Paul Ferraro
            tborgato@redhat.com Tommaso Borgato
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: