Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-503

Retransmitted message too large

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.5
    • None
    • None
    • Hide

      Reduce FRAG2.frag_size and NAKACK.max_xmit_size

      Show
      Reduce FRAG2.frag_size and NAKACK.max_xmit_size

    Description

      Read and respond to this message at:
      https://sourceforge.net/forum/message.php?msg_id=4288472
      By: whatagreatuser

      After leaving a 3 member cluster running for ~2 hours, received bunches of errors
      on all the members. Below is the config XML, which is basically udp.xml with
      some tweaks for for larger thread_pool and oob_thread_pool.

      The Channel is basically used as an RPC layer with an RpcDispatcher sitting
      on top of it. At the time of the errors no â??user-levelâ?? messages were being
      sent. No state is being managed by the channel either.

      //channel setup
      channel=new JChannel(jgroups_config_file);
      channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE);
      channel.addChannelListener(this);
      disp=new RpcDispatcher(channel, null, this, this, true, true);

      //force connect
      channel.connect(clusterName);

      ERROR | failed sending message to 10.0.0.64:28701 (163 bytes)

      java.lang.Exception: dest=/10.0.0.64:28701 (65508 bytes)
      at org.jgroups.protocols.UDP._send(UDP.java:345)
      at org.jgroups.protocols.UDP.sendToSingleMember(UDP.java:302)
      at org.jgroups.protocols.TP.doSend(TP.java:1197)
      at org.jgroups.protocols.TP.send(TP.java:1184)
      at org.jgroups.protocols.TP.down(TP.java:957)
      at org.jgroups.protocols.Discovery.down(Discovery.java:325)
      at org.jgroups.protocols.MERGE2.down(MERGE2.java:184)
      at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:406)
      at org.jgroups.protocols.FD.down(FD.java:363)
      at org.jgroups.stack.Protocol.down(Protocol.java:270)
      at org.jgroups.protocols.BARRIER.down(BARRIER.java:94)
      at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:494)
      at org.jgroups.protocols.UNICAST.retransmit(UNICAST.java:467)
      at org.jgroups.stack.AckSenderWindow.retransmit(AckSenderWindow.java:145)
      at org.jgroups.stack.Retransmitter$Entry.run(Retransmitter.java:339)
      at org.jgroups.util.TimeScheduler$TaskWrapper.run(TimeScheduler.java:187)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
      at java.util.concurrent.FutureTask.run(FutureTask.java:123)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      at java.lang.Thread.run(Thread.java:595)

      Caused by: java.net.SocketException: The message is larger than the maximum
      supported by the underlying transport: Datagram send failed
      at java.net.PlainDatagramSocketImpl.send(Native Method)
      at java.net.DatagramSocket.send(DatagramSocket.java:612)
      at org.jgroups.protocols.UDP._send(UDP.java:341)
      ... 23 more

      ERROR | failed sending message to 10.0.0.64:28701 (163 bytes)

      java.lang.Exception: dest=/10.0.0.64:28701 (65508 bytes)
      at org.jgroups.protocols.UDP._send(UDP.java:345)
      at org.jgroups.protocols.UDP.sendToSingleMember(UDP.java:302)
      at org.jgroups.protocols.TP.doSend(TP.java:1197)
      at org.jgroups.protocols.TP.send(TP.java:1184)
      at org.jgroups.protocols.TP.down(TP.java:957)
      at org.jgroups.protocols.Discovery.down(Discovery.java:325)
      at org.jgroups.protocols.MERGE2.down(MERGE2.java:184)
      at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:406)
      at org.jgroups.protocols.FD.down(FD.java:363)
      at org.jgroups.stack.Protocol.down(Protocol.java:270)
      at org.jgroups.protocols.BARRIER.down(BARRIER.java:94)
      at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:494)
      at org.jgroups.protocols.UNICAST.retransmit(UNICAST.java:467)
      at org.jgroups.stack.AckSenderWindow.retransmit(AckSenderWindow.java:145)
      at org.jgroups.stack.Retransmitter$Entry.run(Retransmitter.java:339)
      at org.jgroups.util.TimeScheduler$TaskWrapper.run(TimeScheduler.java:187)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
      at java.util.concurrent.FutureTask.run(FutureTask.java:123)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      at java.lang.Thread.run(Thread.java:595)

      Caused by: java.net.SocketException: The message is larger than the maximum
      supported by the underlying transport: Datagram send failed
      at java.net.PlainDatagramSocketImpl.send(Native Method)
      at java.net.DatagramSocket.send(DatagramSocket.java:612)
      at org.jgroups.protocols.UDP._send(UDP.java:341)

      ... 23 more

      ERROR | failed sending message to 10.0.0.64:28702 (66863 bytes)

      java.lang.Exception: message size (66935) is greater than max bundling size
      (64000). Set the fragmentation/bundle size in FRAG and TP correctly
      at org.jgroups.protocols.TP$Bundler.checkLength(TP.java:1781)
      at org.jgroups.protocols.TP$Bundler.send(TP.java:1670)
      at org.jgroups.protocols.TP$Bundler.access$100(TP.java:1657)
      at org.jgroups.protocols.TP.send(TP.java:1173)
      at org.jgroups.protocols.TP.down(TP.java:957)
      at org.jgroups.protocols.Discovery.down(Discovery.java:325)
      at org.jgroups.protocols.MERGE2.down(MERGE2.java:184)
      at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:406)
      at org.jgroups.protocols.FD.down(FD.java:363)
      at org.jgroups.stack.Protocol.down(Protocol.java:270)
      at org.jgroups.protocols.BARRIER.down(BARRIER.java:94)
      at org.jgroups.protocols.pbcast.NAKACK.sendXmitRsp(NAKACK.java:840)
      at org.jgroups.protocols.pbcast.NAKACK.handleXmitReq(NAKACK.java:789)
      at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:537)
      at org.jgroups.protocols.BARRIER.up(BARRIER.java:119)
      at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:154)
      at org.jgroups.protocols.FD.up(FD.java:328)
      at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
      at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
      at org.jgroups.protocols.Discovery.up(Discovery.java:224)
      at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1541)
      at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1495)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      at java.lang.Thread.run(Thread.java:595)

      <config>

      <UDP

      mcast_addr="${jgroups.udp.mcast_addr:228.10.10.50}"

      mcast_port="${jgroups.udp.mcast_port:50000}"

      tos="8"

      ucast_recv_buf_size="20000000"

      ucast_send_buf_size="640000"

      mcast_recv_buf_size="25000000"

      mcast_send_buf_size="640000"

      loopback="false"

      discard_incompatible_packets="true"

      max_bundle_size="64000"

      max_bundle_timeout="30"

      use_incoming_packet_handler="true"

      ip_ttl="${jgroups.udp.ip_ttl:2}"

      enable_bundling="true"

      enable_diagnostics="true"

      thread_naming_pattern="cl"

      use_concurrent_stack="true"

      thread_pool.enabled="true"

      thread_pool.min_threads="1"

      thread_pool.max_threads="250"

      thread_pool.keep_alive_time="60000"

      thread_pool.queue_enabled="true"

      thread_pool.queue_max_size="10000"

      thread_pool.rejection_policy="Abort"

      oob_thread_pool.enabled="true"

      oob_thread_pool.min_threads="1"

      oob_thread_pool.max_threads="250"

      oob_thread_pool.keep_alive_time="60000"

      oob_thread_pool.queue_enabled="true"

      oob_thread_pool.queue_max_size="10000"

      oob_thread_pool.rejection_policy="Abort"/>

      <PING timeout="2000"

      num_initial_members="2"/>

      <MERGE2 max_interval="30000"

      min_interval="10000"/>

      <FD_SOCK/>

      <FD timeout="10000" max_tries="5" shun="true"/>

      <VERIFY_SUSPECT timeout="1500" />

      <BARRIER />

      <pbcast.NAKACK max_xmit_size="60000"

      use_mcast_xmit="false" gc_lag="0"

      retransmit_timeout="300,600,1200,2400,4800"

      discard_delivered_msgs="true"/>

      <UNICAST timeout="300,600,1200,2400,3600"/>

      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

      max_bytes="400000"/>

      <VIEW_SYNC avg_send_interval="60000" />

      <pbcast.GMS print_local_addr="true" join_timeout="3000"

      join_retry_timeout="2000" shun="false"

      view_bundling="true"/>

      <FC max_credits="20000000"

      min_threshold="0.10"/>

      <FRAG2 frag_size="60000" />

      <!-pbcast.STREAMING_STATE_TRANSFER use_reading_thread="true"/->

      <pbcast.STATE_TRANSFER />

      <!-- pbcast.FLUSH /-->

      </config>

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: