Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-201

Some messages lost when replicated LevelDB used

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • Hide

      Deploy fuse on at least 4 machines where one is fabric server and three are replicated LevelDB brokers.

      1. Start sending N (N must be large) messages and start consumer which receives that messages.
      2. repeat untill there is no message received in large timeout
        1. kill master container
        2. start master container again
        3. sleep for some timeout (sending and receiving continues in other thread)

      Then usually few messages are not received.

      I was unable to reproduce this issue when deploying all containers on one machine.

      Show
      Deploy fuse on at least 4 machines where one is fabric server and three are replicated LevelDB brokers. Start sending N (N must be large) messages and start consumer which receives that messages. repeat untill there is no message received in large timeout kill master container start master container again sleep for some timeout (sending and receiving continues in other thread) Then usually few messages are not received. I was unable to reproduce this issue when deploying all containers on one machine.

      When sending and receiving messages from root fuse to replicated broker with repeated killing master container some messages are not received at all.

      In my case I am sending textMessages with sequence number only from 0 to 100000. This statements appears in log sometimes after client reconnets to new elected master (after the previous master was killed):

      2014-12-10 16:37:58,996 | WARN  |  Session Task-45 | ActiveMQMessageConsumer          | 181 - org.apache.activemq.activemq-osgi - 5.10.0.redhat-620049 | ID:fuseqe18.os1.phx2.redhat.com-59857-1418225686303-5:1:1:1 suppressing duplicate delivery on connection, poison acking: MessageDispatch {commandId = 0, responseRequired = false, consumerId = ID:fuseqe18.os1.phx2.redhat.com-59857-1418225686303-5:1:1:1, destination = queue://replicated.SOAK-1, message = ActiveMQTextMessage {commandId = 67011, responseRequired = true, messageId = ID:fuseqe18.os1.phx2.redhat.com-59857-1418225686303-3:1:1:1:67007, originalDestination = null, originalTransactionId = null, producerId = ID:fuseqe18.os1.phx2.redhat.com-59857-1418225686303-3:1:1:1, destination = queue://replicated.SOAK-1, transactionId = null, expiration = 0, timestamp = 1418229416988, arrival = 0, brokerInTime = 1418229415818, brokerOutTime = 1418229477637, correlationId = null, replyTo = null, persistent = true, type = null, priority = 4, groupID = null, groupSequence = 0, targetConsumerId = null, compressed = false, userID = null, content = org.apache.activemq.util.ByteSequence@71eb8979, marshalledProperties = null, dataStructure = null, redeliveryCounter = 0, size = 0, properties = null, readOnlyProperties = true, readOnlyBody = true, droppable = false, jmsXGroupFirstForConsumer = false, text = 67006}, redeliveryCounter = 0}
      

      The message mentioned in the log is received by client but also moved to DLQ queue. But messages which are lost has sequence number larger by one (or two e. g. lost message 99895) than reported ones.

      In my case these messages are lost:

      9307, 15568, 19855, 24001, 28316, 31999, 33978, 37254, 44810, 51868, 53908, 71453, 67007, 77283, 83515, 99895
      

      Theese messages are mentioned in the log:

      9306, 15567, 23999, 24000, 28315, 33977, 37253, 48272, 51867, 53907, 57782, 67006, 71452, 77282, 84601, 90848, 99890, 99891, 99892, 99893
      

        1. fuse.log
          2.15 MB
        2. missing-messages-test.zip
          8 kB
        3. stdout.txt
          2.50 MB

            Unassigned Unassigned
            knetl.j@gmail.com Jakub Knetl (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: