Uploaded image for project: 'jBPM'
  1. jBPM
  2. JBPM-4445

Process stuck after five iteration of a parallel loop

XMLWordPrintable

      We have a bigger workflow process, which sometimes stuck at parallel join node.
      The workflow can loop thought this parallel loop more times, depending on human task outputs (these are typically approval decisions).

      We attached a boundary signal event to the human tasks to catch the rejection of the other parallel branch.
      Human tasks in their On Exit Scripts checks approval decision, and sends signal to other branch's HumanTask, if it is needed.
      So we exit from the parallel part of workflow on reject immediately.

      I attached a simplified version of this workflow, a test case, and modified jbpm601 source files for debugging in a zip archive.

      During our application testing, and junit tests we found that, the problem occurs just then, if the parallel human task nodes in each branch are executed alternately (JbpmParalellLoopTest.testProcessAlternateReject()).
      In this situation the process executes five loop correctly, but after the fifth loop the process stuck at Parellel_End node.
      If in the loop we always reject the same human task (on a given branch), the problem doesn't occur! See other test cases in JbpmParalellLoopTest class.
      We tested the bpmn file against Jbpm API 6.0.1.Final and 6.1.0.Final, but booth fails.

      I tried to find out what could be the problem, enhancing the existing jbpm source code with extra debug lines, and found the below:

      • WorkflowProcessInstanceImpl.getFirstNodeInstance method doesn't find the searched node, because nodeInstance.getLevel() and getCurrentLevel() are different.
        • When the problem occurs, the node with the searched nodeId is in the current context, but their level is different.
      • ReuseNodeFactory.getNodeInstance calls the above getFirstNodeInstance method.
        • Because it doesn't find NodeInstance the method returns null, and the getNodeInstance creates a new JoinInstace record...

      In a correct loop we get back the previously used and initialized JoinInstance, where the triggers map stores that the other branch is already executed.
      But now we got a new JoinInstance, and this instance has a brand new internal triggers map that doesn't see that the other branch is already executed.
      So when in JoinInstance.internalTrigger method is called and inside it checkAllActivated method returns false.
      That's why parallel Join doesn't finish/exit, and the process stuck at this point.

      I tried hack the getFirstNodeInstance code to skip level checking if the node is my sample process's 'Parallel_End' node, if I applied this modification everything was good, the loop run fine more than five times.
      I hope the above description helps you to reproduce and fix the problem.

      Thank for your help!

            swiderski.maciej Maciej Swiderski (Inactive)
            szabolcs.eory_jira Szabolcs Eory (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: