Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1464

TCPConnectionMap: message from different JGroups version may cause OOME

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.0.10, 3.1
    • None
    • None

    Description

      If discard_incompatible_packets is enabled at the transport level, we discard packets from other JGroups versions. However, this is not done when TCP is used, at the TCPConnectionMap level.

      When we read a packet from a different version, we first read the version, then the length. If the length is garbage, and we interpret it as a long, it can be huge, leading to an OOME.

      email from http://old.nabble.com/ArrayIndexOutOfBoundsException-in-3.09-ts33781725.html#a33852559
      I'd like to add some additional information to this post since I am seeing this crash the entire JVM. I've added some debugging statements to the TCPConnectionMap - ConnectionPeerReceiver.run method to try an understand the issue and recompiled the 3.0.9 jar (I am printing the len of the DataInputStream). It appears that when an incompatible message is received JG will discard some messages appropriately but not others. When some of these messages make it through they end up being of enormous size 1192331780 (or ~1.1 GB) which is enough to crash the JVM due to OOM (catching the OOMError in not sufficient in our configuration). Some are of size 0 as well... curious if they is some signed/unsigned conversion going on here? At any rate this is prohibiting us from moving to the latest version of JGroups in hopes of receiving better support.

      2012-05-14 09:27:05,623 [Connection.Receiver [135.9.96.63:59953 - 135.9.128.31:7800],135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=70
      2012-05-14 09:27:05,624 [Connection.Receiver [135.9.96.63:59953 - 135.9.128.31:7800],135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=70
      2012-05-14 09:27:05,624 [OOB-1,135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.protocols.TCP WARN - packet from 135.9.128.31:7800 has different version (2.6.10) from ours (3.0.10). Packet is discarded
      2012-05-14 09:27:05,625 [OOB-2,135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.protocols.TCP WARN - packet from 135.9.128.31:7800 has different version (2.6.10) from ours (3.0.10). Packet is discarded
      2012-05-14 09:27:12,028 [ConnectionMap.Acceptor,null,null] jgroups.blocks.TCPConnectionMap$TCPConnection WARN - packet from /135.9.96.59:56077 has different version (2.6.10) from ours (3.0.10). This may cause problems
      2012-05-14 09:27:12,030 [Connection.Receiver [135.9.96.63:7800 - 135.9.96.59:56077],135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=0
      2012-05-14 09:27:12,032 [Connection.Receiver [135.9.96.63:7800 - 135.9.96.59:56077],135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.protocols.TCP ERROR - failed handling data from 135.9.96.59:7800
      java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 2
      at org.jgroups.protocols.TP.receive(TP.java:1200)
      at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:104)
      at org.jgroups.blocks.TCPConnectionMap$TCPConnection$ConnectionPeerReceiver.run(TCPConnectionMap.java:603)
      at java.lang.Thread.run(Thread.java:769)
      2012-05-14 09:27:12,034 [Connection.Receiver [135.9.96.63:7800 - 135.9.96.59:56077],135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=1192331780

      SOLUTION: discard packets from different JGroups versions in TCPConnectionMap.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: