Uploaded image for project: 'FUSE Message Broker'
  1. FUSE Message Broker
  2. MB-883

Kahadb Chunk stream does not exist when broker under stress

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 5.5.0-fuse-00-00
    • Component/s: None
    • Labels:
      None

      Description

      There appears to be a race condition in activemq. We're still working on the test case, but this is what I can tell you:

      2011-04-07 08:04:07,295 [Queue:loadq-4  ] ERROR FilePendingMessageCursor       - I/O error
      java.io.EOFException: Chunk stream does not exist at page: 198
      	at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454)
      	at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:431)
      	at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:428)
      	at org.apache.kahadb.page.Transaction.load(Transaction.java:404)
      	at org.apache.kahadb.page.Transaction.load(Transaction.java:361)
      	at org.apache.activemq.store.kahadb.plist.PList.loadEntry(PList.java:468)
      	at org.apache.activemq.store.kahadb.plist.PList.getNext(PList.java:483)
      	at org.apache.activemq.store.kahadb.plist.PList$10.execute(PList.java:312)
      	at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
      	at org.apache.activemq.store.kahadb.plist.PList.getNext(PList.java:310)
      	at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.next(FilePendingMessageCursor.java:500)
      	at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.next(FilePendingMessageCursor.java:473)
      	at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.next(FilePendingMessageCursor.java:293)
      	at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1714)
      	at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1932)
      	at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1440)
      	at org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:104)
      	at org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:42)

      This scenario occurs on a really fast machine, e.g. 8 CPU's when running Hiram's Stomp Benchmark protocol against the latest 5.5 release. Unfortunately, the error does not always occur, only sometimes.

      Gary believes the error occurs as result of the file cursor is hitting the 1gb limit set in the config and starting to write to disk. On Vader, I can come close:

      	
      ERROR | Exception on periodic cleanup: java.io.EOFException: Chunk stream does not exist at page: 8490
      java.io.EOFException: Chunk stream does not exist at page: 8490
        at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454)
        at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:431)
        at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:428)
        at org.apache.kahadb.page.Transaction.load(Transaction.java:404)
        at org.apache.kahadb.page.Transaction.load(Transaction.java:361)
        at org.apache.activemq.store.kahadb.plist.PList.loadEntry(PList.java:464)
        at org.apache.activemq.store.kahadb.plist.PList.getNext(PList.java:479)
        at org.apache.activemq.store.kahadb.plist.PList$10.execute(PList.java:309)
        at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
        at org.apache.activemq.store.kahadb.plist.PList.getNext(PList.java:307)
        at org.apache.activemq.store.kahadb.plist.PListStore.run(PListStore.java:331)
        at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
      Exception in thread "ActiveMQ Broker[localhost] Scheduler" java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
        at java.util.HashMap$ValueIterator.next(HashMap.java:822)
        at org.apache.activemq.store.kahadb.plist.PListStore.run(PListStore.java:327)
        at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)

      The java.utilConcurrentModificationException seems to happen somewhere in the topic fan test. I can usually get it sort of consistent by starting with empty data file (rm -rf data) and restarting the broker. I'm just not sure where in that test yet and it's not the kahadb error. If I find anything more, will let you know.

        Gliffy Diagrams

          Activity

          Hide
          garytully Gary Tully added a comment -
          Show
          garytully Gary Tully added a comment - fixed at apache by https://issues.apache.org/jira/browse/AMQ-3325
          Hide
          garytully Gary Tully added a comment -

          fix now on fuse branch.

          Show
          garytully Gary Tully added a comment - fix now on fuse branch.

            People

            • Assignee:
              garytully Gary Tully
              Reporter:
              sjavurek Susan Javurek
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: