Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1665

UNICAST3 / NAKACK2: problem with flow control and message batching

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • 3.3.4, 3.4
    • None
    • None
    • Hide
      • Start a state provider with 900K of state: jt LargeState -size 900000 -provider -props ./fast.xml -name A
      • Start a state requester: jt LargeState -props ./fast.xml -name B. This client will block forever, as the state requester runs out of credits and - while the requester does send new credits - the credits are not processed as they're behind the state request messages in the requester's queue
      • (fast.xml is attached has 800K of max_credits in UFC)
      Show
      Start a state provider with 900K of state: jt LargeState -size 900000 -provider -props ./fast.xml -name A Start a state requester: jt LargeState -props ./fast.xml -name B. This client will block forever, as the state requester runs out of credits and - while the requester does send new credits - the credits are not processed as they're behind the state request messages in the requester's queue (fast.xml is attached has 800K of max_credits in UFC)

    Description

      When a message is received by UNICAST3 or NAKACK2 and passed up to the application, but the app in turn sends large amounts of data down, then it may block in the flow control protocol (UFC or MFC).
      We did remove ignore_sync_response (in https://issues.jboss.org/browse/JGRP-1655), which would let sync responses pass through flow control, assuming that credit responses would be received as OOBs.

      However, the issue with credits being received as part of message batches is that they're received as internal batches, not OOB batches.

      This means that they are delivered sequentially (since they're from the same sender) and are thus not applied until after the blocking request returns, which is not the case: deadlock !
      The reason is that credits are marked as INTERNAL (and also OOB), but the code which reads batches adds all INTERNAL|OOB messages to the internal batch, which is processed by the internal pool, but since the batch itself is not tagged as OOB, the messages will not get processed immediately.

      SOLUTION: treat INTERNAL the same as OOB, e.g. check for OOB and INTERNAL:

      if(batch.mode() == INTERNAL || batch.mode() == OOB)
      

      An alternative would be to revisit the idea of the internal thread pool: is it really needed ? And possibly get rid of it.

      Git bisect shows the faulty commit on July 16:

      [linux]/home/bela/JGroups$ git bisect good
      87cf70d936bb3b15860bf4cd89fe2fff49e85d1e is the first bad commit
      commit 87cf70d936bb3b15860bf4cd89fe2fff49e85d1e
      Author: Bela Ban <belaban@yahoo.com>
      Date:   Tue Jul 16 14:51:10 2013 +0200
      
          Removed ignore_synchronous_thread and ignore_thread from FlowControl; not needed anymore as we cannot block anymore on credit responses (https://issues.jboss.org/browse/JGRP-1655)
      
      :040000 040000 d0e4647ab4b89d320e45a460f0385311e2e4eba0 fefac0eb21ab3609ccf440eec9ec8f659a819f6b M      src
      
      

      Attachments

        Issue Links

          Activity

            People

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: