Uploaded image for project: 'FUSE Message Broker'
  1. FUSE Message Broker
  2. MB-1199

NPE exception in kahadb after index issue

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 5.5.1-fuse-04-01
    • Fix Version/s: 5.5.1-fuse-10-16
    • Component/s: broker
    • Labels:
      None

      Description

      2012-07-24 07:11:19,476 [n.ch] Scheduler] ERROR Queue                          - Problem retrieving message for browse
      java.util.NoSuchElementException: Could not locate data file /var/lib/activemq/data/activemq_gridmsg104.cern.ch/tmp_storage/db-1093.log
              at org.apache.activemq.store.kahadb.plist.PList$PListIterator.next(PList.java:214)
              at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.next(FilePendingMessageCursor.java:508)
              at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.next(FilePendingMessageCursor.java:492)
              at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.next(FilePendingMessageCursor.java:305)
              at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1053)
              at org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:804)
              at org.apache.activemq.broker.region.Queue.access$100(Queue.java:91)
              at org.apache.activemq.broker.region.Queue$2.run(Queue.java:131)
              at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
              at java.util.TimerThread.mainLoop(Timer.java:512)
              at java.util.TimerThread.run(Timer.java:462)
      Caused by: java.io.IOException: Could not locate data file /var/lib/activemq/data/activemq_gridmsg104.cern.ch/tmp_storage/db-1093.log
              at org.apache.kahadb.journal.Journal.getDataFile(Journal.java:350)
              at org.apache.kahadb.journal.Journal.read(Journal.java:597)
              at org.apache.activemq.store.kahadb.plist.PListStore.getPayload(PListStore.java:394)
              at org.apache.activemq.store.kahadb.plist.PList$PListIterator.next(PList.java:212)
              ... 10 more
       

      After a restart, we now see NPE related to KahaDB:

       
      2012-07-24 11:39:05,287 [n.ch] Scheduler] ERROR Queue                          - Problem retrieving message for browse
      java.lang.NullPointerException
      	at org.apache.kahadb.index.ListNode.access$200(ListNode.java:37)
      	at org.apache.kahadb.index.ListNode$NodeMarshaller.writePayload(ListNode.java:265)
      	at org.apache.kahadb.index.ListNode$NodeMarshaller.writePayload(ListNode.java:255)
      	at org.apache.kahadb.page.Transaction.store(Transaction.java:248)
      	at org.apache.kahadb.index.ListIndex.storeNode(ListIndex.java:326)
      	at org.apache.kahadb.index.ListNode.store(ListNode.java:348)
      	at org.apache.kahadb.index.ListNode.access$500(ListNode.java:37)
      	at org.apache.kahadb.index.ListNode$ListIterator.remove(ListNode.java:227)
      	at org.apache.activemq.store.kahadb.plist.PList$PListIterator$1.execute(PList.java:228)
      	at org.apache.kahadb.page.Transaction.execute(Transaction.java:765)
      	at org.apache.activemq.store.kahadb.plist.PList$PListIterator.remove(PList.java:225)
      	at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.remove(FilePendingMessageCursor.java:517)
      	at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.remove(FilePendingMessageCursor.java:321)
      	at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1061)
      	at org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:804)
      	at org.apache.activemq.broker.region.Queue.access$100(Queue.java:91)
      	at org.apache.activemq.broker.region.Queue$2.run(Queue.java:131)
      	at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
      	at java.util.TimerThread.mainLoop(Timer.java:512)
      	at java.util.TimerThread.run(Timer.java:462)
       

      Seems to be a variant of https://issues.apache.org/jira/browse/AMQ-3434 and Gary put together a test case which we believe represents the problem.

        Gliffy Diagrams

          Activity

          Hide
          garytully Gary Tully added a comment -

          I have a test case that runs for <5mins and always shows a problem, but the symptoms vary (From iterators reading too much to chunk stream not exist to NPE). I am getting closer to the root cause, but don't yet have it narrowed doen completely.

          Show
          garytully Gary Tully added a comment - I have a test case that runs for <5mins and always shows a problem, but the symptoms vary (From iterators reading too much to chunk stream not exist to NPE). I am getting closer to the root cause, but don't yet have it narrowed doen completely.
          Hide
          joe.luo Joe Luo added a comment -

          Sometimes, following stack trace can also be seen:

          2012-08-08 20:49:47,918 [n.ch] Scheduler] ERROR Queue - Problem retrieving message for browse
          java.util.NoSuchElementException: Chunk stream does not exist, page: 75281 is marked free
          at org.apache.kahadb.index.ListNode$ListIterator.getFromNextNode(ListNode.java:165)
          at org.apache.kahadb.index.ListNode$ListIterator.hasNext(ListNode.java:176)
          at org.apache.activemq.store.kahadb.plist.PList$PListIterator.hasNext(PList.java:204)
          at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.hasNext(FilePendingMessageCursor.java:503)
          at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.hasNext(FilePendingMessageCursor.java:297)
          at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1052)
          at org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:804)
          at org.apache.activemq.broker.region.Queue.access$100(Queue.java:91)
          at org.apache.activemq.broker.region.Queue$2.run(Queue.java:131)
          at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
          at java.util.TimerThread.mainLoop(Timer.java:512)
          at java.util.TimerThread.run(Timer.java:462)
          Caused by: java.io.EOFException: Chunk stream does not exist, page: 75281 is marked free
          at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:455)
          at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:432)
          at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:429)
          at org.apache.kahadb.page.Transaction.load(Transaction.java:405)
          at org.apache.kahadb.page.Transaction.load(Transaction.java:362)
          at org.apache.kahadb.index.ListIndex.loadNode(ListIndex.java:306)

          Show
          joe.luo Joe Luo added a comment - Sometimes, following stack trace can also be seen: 2012-08-08 20:49:47,918 [n.ch] Scheduler] ERROR Queue - Problem retrieving message for browse java.util.NoSuchElementException: Chunk stream does not exist, page: 75281 is marked free at org.apache.kahadb.index.ListNode$ListIterator.getFromNextNode(ListNode.java:165) at org.apache.kahadb.index.ListNode$ListIterator.hasNext(ListNode.java:176) at org.apache.activemq.store.kahadb.plist.PList$PListIterator.hasNext(PList.java:204) at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor$DiskIterator.hasNext(FilePendingMessageCursor.java:503) at org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.hasNext(FilePendingMessageCursor.java:297) at org.apache.activemq.broker.region.Queue.doBrowse(Queue.java:1052) at org.apache.activemq.broker.region.Queue.expireMessages(Queue.java:804) at org.apache.activemq.broker.region.Queue.access$100(Queue.java:91) at org.apache.activemq.broker.region.Queue$2.run(Queue.java:131) at org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) Caused by: java.io.EOFException: Chunk stream does not exist, page: 75281 is marked free at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:455) at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:432) at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:429) at org.apache.kahadb.page.Transaction.load(Transaction.java:405) at org.apache.kahadb.page.Transaction.load(Transaction.java:362) at org.apache.kahadb.index.ListIndex.loadNode(ListIndex.java:306)
          Hide
          garytully Gary Tully added a comment -

          An additional test that breaks and can demonstrate the problem:
          org.apache.activemq.store.kahadb.plist.PListTest#testConcurrentAddIterate

          Show
          garytully Gary Tully added a comment - An additional test that breaks and can demonstrate the problem: org.apache.activemq.store.kahadb.plist.PListTest#testConcurrentAddIterate
          Hide
          garytully Gary Tully added a comment -

          great news on this one, looks like the problem is in reading and seek. Top find tim.

          Show
          garytully Gary Tully added a comment - great news on this one, looks like the problem is in reading and seek. Top find tim.
          Hide
          sjavurek Susan Javurek added a comment -

          Hi Tim,

          It looks like a new snapshot built last night:

          apache-activemq-5.5.1.fuse-20120822.002833-49-bin.tar.gz.md5 Wed Aug 22 09:07:15 UTC 2012 32

          Will it be in there or do you think we need to wait for the next one?

          Susan

          Show
          sjavurek Susan Javurek added a comment - Hi Tim, It looks like a new snapshot built last night: apache-activemq-5.5.1.fuse-20120822.002833-49-bin.tar.gz.md5 Wed Aug 22 09:07:15 UTC 2012 32 Will it be in there or do you think we need to wait for the next one? Susan

            People

            • Assignee:
              tabish121 Timothy Bish
              Reporter:
              sjavurek Susan Javurek
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: