Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-5477

Failback fails with ActiveMQIllegalStateException during synchronization with live server

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Blocker
    • 10.0.0.CR5
    • 10.0.0.CR2
    • JMS
    • None

    Description

      Sometimes happens that synchronization between live and backup fails during failback with exception ActiveMQIllegalStateException. It causes that live does not activate and backup stops so none of the servers is active to serve clients.

      Test scenario:
      1. Start 2 EAP 7.0.0.DR11 servers with Artemis configured in dedicated topology with replicated journal
      – 1st EAP server has Artemis configured as live, 2nd EAP server has Artemis configured as backup
      – queues InQueue and OutQueue are deployed
      2. Send 2000 messages to InQueue to 1st server (live)
      3. Start 3rd EAP 7.0.0.DR11 server with MDB consuming from remote InQueue and sending to remote OutQueue in XA transaction
      – resource adapter is configured for failover
      4. Kill live server when MDB is processing messages
      5. Wait for backup to activate and failover to happen
      6. Start live server again and wait for failback

      In step 6. sometimes happens that synchronization between live and backup fails during failback with exception:

      10:05:13,493 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=I
      LLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live]
              at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:232) [artemis-server-1.1.0.jar:1.1.0]
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60]
      

      and live server never activates. Also backup server stops with:

      10:05:17,846 INFO  [org.apache.activemq.artemis.core.server] (Thread-108) AMQ221002: Apache ActiveMQ Artemis Message Broker version 1.1.0 [706b0cb8-6b69-11e5-904d-fd646d33ece8] stopped
      10:05:17,846 INFO  [org.apache.activemq.artemis.core.server] (Thread-108) AMQ221039: Restarting as Replicating backup server after live restart
      

      so live/backup pair is dead and server with MDB looses connection.

      Attaching logs from servers and configurations.

      Attachments

        1. backup-logs.zip
          7.00 MB
        2. live-logs.zip
          1.33 MB
        3. mdb-server-logs.zip
          7.16 MB
        4. standalone-full-ha-backup.xml
          28 kB
        5. standalone-full-ha-live.xml
          28 kB
        6. standalone-full-ha-mdb.xml
          28 kB

        Issue Links

          Activity

            People

              csuconic@redhat.com Clebert Suconic
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: