Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1155

Socket Exceptions on Coordinator after adding start_port in FD_SOCK

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • 2.10
    • 2.6.10, 2.6.13
    • None

      I was able to reproduce it with Draw program on both 2.6.10.GA and 2.6.13.GA. Here is my command:
      java -cp .:log4j.properties:log4j.jar:jgroups-all.jar:commons-logging.jar org.jgroups.demos.Draw -props kenTcp.xml

      log4j.properties was on WARN and above for org.jgroups.

      I would tail the jgroups.log file and ONLY on the coordinator I would eventually see these messages printing every so often:

      2010-02-16 15:49:43,144 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:50167 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Broken pipe
      2010-02-16 15:49:43,144 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:50167 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Broken pipe
      2010-02-16 15:50:07,624 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:38803 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Socket closed
      2010-02-16 15:50:07,624 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:38803 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Socket closed
      2010-02-16 15:50:33,608 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:44940 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Socket closed
      2010-02-16 15:50:33,608 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:44940 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Socket closed
      2010-02-16 15:50:33,611 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:55279 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Broken pipe
      2010-02-16 15:50:33,611 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:55279 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Broken pipe
      2010-02-16 15:50:35,115 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:42406 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Broken pipe
      2010-02-16 15:50:35,115 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802 [135.9.147.170:42406 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803: java.net.SocketException: Broken pipe

      You can see it is trying to use some different ephemeral port each time, but from "netstat -an | grep 780" (FD_SOCK is on 7803, and channel is on 7802), you can see it is using the wrong socket, where 158 is the other client:

      tcp 0 0 ::ffff:135.9.147.170:7802 :::* LISTEN
      tcp 0 0 ::ffff:135.9.147.170:7803 :::* LISTEN
      tcp 0 0 ::ffff:135.9.147.170:7803 ::ffff:135.9.147.158:41549 ESTABLISHED
      tcp 0 0 ::ffff:135.9.147.170:57478 ::ffff:135.9.147.158:7802 ESTABLISHED
      tcp 0 0 ::ffff:135.9.147.170:46929 ::ffff:135.9.147.158:7803 ESTABLISHED

      Protocol stack in XML (the TCPPING initial_hosts value is of the other member for the other JGroups member):
      <config>
      <TCP start_port="7802"
      loopback="false"
      recv_buf_size="20000000"
      send_buf_size="640000"
      discard_incompatible_packets="false"
      max_bundle_size="128000"
      max_bundle_timeout="100"
      use_incoming_packet_handler="true"
      enable_bundling="true"
      use_send_queues="true"
      sock_conn_timeout="300"
      skip_suspected_members="true"

      use_concurrent_stack="true"

      thread_pool.enabled="true"
      thread_pool.min_threads="2"
      thread_pool.max_threads="10"
      thread_pool.keep_alive_time="5000"
      thread_pool.queue_enabled="false"
      thread_pool.queue_max_size="1000"
      thread_pool.rejection_policy="run"

      oob_thread_pool.enabled="true"
      oob_thread_pool.min_threads="2"
      oob_thread_pool.max_threads="10"
      oob_thread_pool.keep_alive_time="5000"
      oob_thread_pool.queue_enabled="false"
      oob_thread_pool.queue_max_size="1000"
      oob_thread_pool.rejection_policy="run"/>

      <TCPPING timeout="3000"
      initial_hosts="135.9.147.170[7800]}"
      port_range="4"/>
      <MERGE2 max_interval="100000"
      min_interval="20000"/>
      <FD_SOCK start_port="7803"/>
      <FD timeout="10000" max_tries="5" shun="true"/>
      <VERIFY_SUSPECT timeout="1500" />
      <BARRIER />
      <pbcast.NAKACK
      use_mcast_xmit="false" gc_lag="0"
      retransmit_timeout="300,600,1200,2400,4800"
      discard_delivered_msgs="true"/>
      <UNICAST timeout="300,600,1200" />
      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
      max_bytes="400000"/>
      <VIEW_SYNC avg_send_interval="60000"/>
      <pbcast.GMS print_local_addr="true" join_timeout="3000"
      shun="true"
      view_bundling="true"/>
      <FC max_credits="2000000"
      min_threshold="0.10"/>
      <FRAG2 frag_size="60000" />
      <pbcast.STREAMING_STATE_TRANSFER/>
      <!-- <pbcast.STATE_TRANSFER/> -->
      </config>

            rhn-engineering-bban Bela Ban
            mcman_jira Ken Michie (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: