Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1667

OutOfMemoryError - messages are piling up

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 3.4
    • 2.12.3
    • None

    Description

      One of our customers encountered OOME in their production running 5 node Infinispan cluster. Crash happened after ~month of runtime.

      Stacktrace:

      Timer-19,_threadNameOmmitted_32726 tid=188 [RUNNABLE] [DAEMON] <--- OutOfMemoryError happened in this thread
      java.lang.OutOfMemoryError.<init>()
      org.jgroups.blocks.TCPConnectionMap$TCPConnection.send(byte[], int, int)
      org.jgroups.blocks.TCPConnectionMap$TCPConnection.access$100(TCPConnectionMap$TCPConnection, byte[], int, int)
      org.jgroups.blocks.TCPConnectionMap.send(Address, byte[], int, int)
      org.jgroups.protocols.TCP.send(Address, byte[], int, int)
      org.jgroups.protocols.BasicTCP.sendUnicast(PhysicalAddress, byte[], int, int)
      org.jgroups.protocols.TP.sendToSingleMember(Address, byte[], int, int)
      org.jgroups.protocols.TP.doSend(Buffer, Address, boolean)
      org.jgroups.protocols.TP.send(Message, Address, boolean)
      org.jgroups.protocols.TP.down(Event)
      org.jgroups.protocols.Discovery.down(Event)
      org.jgroups.protocols.TCPPING.down(Event)
      org.jgroups.protocols.MERGE2.down(Event)
      org.jgroups.protocols.FD_SOCK.down(Event)
      org.jgroups.protocols.FD.down(Event)
      org.jgroups.protocols.VERIFY_SUSPECT.down(Event)
      org.jgroups.protocols.pbcast.NAKACK.down(Event)
      org.jgroups.protocols.UNICAST.retransmit(long, Message)
      org.jgroups.stack.AckSenderWindow.retransmit(long, long, Address)
      org.jgroups.stack.DefaultRetransmitter$SeqnoTask.callRetransmissionCommand()
      org.jgroups.stack.Retransmitter$Task.run()
      org.jgroups.util.TimeScheduler2$MyTask.run()
      org.jgroups.util.TimeScheduler2$Entry.execute()
      org.jgroups.util.TimeScheduler2$1.run()
      java.lang.Thread.run()

      When I take a look into memory dump, I can see that there is ~470MB of retained heap space held by instance "org.jgroups.blocks.TCPConnectionMap$TCPConnection$Sender". Most memory is reatained by instances of "byte[]". Therefore, I'm assuming that messages are somehow piling up and eventually causing OOME, but I don't know what conditions might trigger such behavior.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            matthewlowe_jira Matthew Lowe (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: