Uploaded image for project: 'JBoss A-MQ'
  1. JBoss A-MQ
  2. ENTMQ-1323

broker cannot be restarted after failover in master/slave setup on windows 2012 shares

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • JBoss A-MQ 6.3
    • JBoss A-MQ 6.2.1
    • master-slave
    • None
    • Workaround Exists
    • Hide

      1) setup master slave on two windows 2012 machines (A and B) with shared filesystem on windows 2012 share (with restartAllowed=true attribute)
      2) simulate network failure between master (suppose it on machine A) and filesystem
      3) wait for failover and then reestablish connection between A and filesystem
      4) simulate network failure between new master (on machine B) and filesystem
      5) wait for failover. When broker on machine A will be starting it will fail to recover journal.

      Show
      1) setup master slave on two windows 2012 machines (A and B) with shared filesystem on windows 2012 share (with restartAllowed=true attribute) 2) simulate network failure between master (suppose it on machine A) and filesystem 3) wait for failover and then reestablish connection between A and filesystem 4) simulate network failure between new master (on machine B) and filesystem 5) wait for failover. When broker on machine A will be starting it will fail to recover journal.

    Description

      I have configured share on windows 2012 which is used for activemq database as shared filesystem. Then machines A,B which has JBoss A-MQ instance running in master/slave mode using the share filesystem. In the meanwhile there is camel route running on separate JBoss A-MQ instance which moves messages between queues.The master broker is on machine A. If there is network failure between A and share then master correctly shuts down and broker on B becomes master.When restartAllowed="true" is configured for brokers, then broker on A restart itselfs and if the connection between A and share is reestablished then broker on A waits since db is locked. If another failover happens broker on A detects that db is unlocked and try to start, but broker fails to start because of following exception| 13:23:50,311 | ERROR | AMQ-1-thread-1 | ActiveMQServiceFactory | 155 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-621070 | Exception on start: Failed to recover data at position:1:32673408|

      java.io.IOException: Failed to recover data at position:1:32673408
      at org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:617)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:400)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:418)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:262)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:205)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:223)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.broker.BrokerService.doStartPersistenceAdapter(BrokerService.java:651)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.broker.BrokerService.startPersistenceAdapter(BrokerService.java:640)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.broker.BrokerService.start(BrokerService.java:605)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.doStart(ActiveMQServiceFactory.java:506)[155:io.fabric8.mq.mq-fabric:1.2.0.redhat-621070]
      at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.access$400(ActiveMQServiceFactory.java:318)[155:io.fabric8.mq.mq-fabric:1.2.0.redhat-621070]
      at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$1.run(ActiveMQServiceFactory.java:449)[155:io.fabric8.mq.mq-fabric:1.2.0.redhat-621070]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_79]
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_79]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_79]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_79]
      at java.lang.Thread.run(Thread.java:745)[:1.7.0_79]
      Caused by: java.io.EOFException: Chunk stream does not exist, page: 2286 is marked free
      at org.apache.activemq.store.kahadb.disk.page.Transaction$2.readPage(Transaction.java:470)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.page.Transaction$2.<init>(Transaction.java:447)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.page.Transaction.openInputStream(Transaction.java:444)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:420)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:377)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.loadNode(BTreeIndex.java:266)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.index.BTreeNode.getChild(BTreeNode.java:233)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.index.BTreeNode.getLeafNode(BTreeNode.java:684)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.index.BTreeNode.put(BTreeNode.java:377)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.put(BTreeIndex.java:189)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.updateIndex(MessageDatabase.java:1307)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase$AddOperation.execute(MessageDatabase.java:2456)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase$16.execute(MessageDatabase.java:1253)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1249)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase$10.visit(MessageDatabase.java:1092)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.data.KahaCommitCommand.visit(KahaCommitCommand.java:130)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1074)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1055)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      at org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:609)[141:org.apache.activemq.activemq-osgi:5.11.0.redhat-621070]
      ... 18 more

      The issue does not happen if whole A-MQ/Fuse instance is restarted (only when broker is restarted either by restart allowed or using osgi:restart command)

      Attachments

        1. fuse-A.log
          232 kB
        2. fuse-B.log
          320 kB

        Activity

          People

            gtully@redhat.com Gary Tully
            knetl.j@gmail.com Jakub Knetl (Inactive)
            David Kornel David Kornel
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: