Uploaded image for project: 'XNIO'
  1. XNIO
  2. XNIO-384

Deadlock between `Selector.select` and `WorkerThread.setOps`

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • 3.8.4.Final
    • api, nio-impl
    • None
    • Hide

      Like most threading issues this is really hard to reproduce in testing. In production it's happening several times a day, and the stack trace is always identical. Essentially you need around a thousand active users actively making requests of the server. Typically we see a deadlock several times a day; once we ran for 2.5 days before it locked.

      Show
      Like most threading issues this is really hard to reproduce in testing. In production it's happening several times a day, and the stack trace is always identical. Essentially you need around a thousand active users actively making requests of the server. Typically we see a deadlock several times a day; once we ran for 2.5 days before it locked.
    • Undefined

    Description

      This issue impacts at least the latest JDK available on the IBM-i midrange system, JDK-8 on 7.4.0.

      First some background facts for my situation. IBM provides the only JDKs for the IBM-i midrange systems. They typically lag way behind in version, hence there is only JDK-8 for the latest 7.4 O/S. So in terms of JVM internals my hands are tied (else I would simply tell the customer to upgrade their JDK). Lastly getting a new JDK will likely depend on a new O/S release, which represents a major process in terms of customers upgrading. So I am absolutely stuck with this implementation of JDK-8. I concede that in this it sucks to be me – this one fact has been the bane of my existence for two decades. I can change anything in Undertow or XNIO, but the JVM code is utterly beyond my reach.

      As can be seen from the stack trace, the internal JVM code for `SelectorImpl.select()` (the short trace) exerts a lock first on its `HashMap`, then on the `SelectionKeyImpl`. Meanwhile the internal code for `SelectionKeyImpl.interestOps` (the long trace) does not exert a lock on the key but does lock the `HashMap`. However the XNIO change made for XNIO-303 / XNIO-304, does exert a lock on `SelectionKeyImpl` around its call to `interestOps` and sets up an out-of-order lock scenario with `select`. Hence the deadlock if `select` and `interestOps` are called at precisely the same time in different threads. This means that the added synchronization of the selection key is the direct causation of this deadlock, at least in the IBM JVM.

      Stack Trace Detail

      Thread "JetFusionWorker task-444" (0x0000000030716B00)
        is waiting for: 
          sys_mon_t:0x0000000193967110 infl_mon_t: 0x0000000193967160: 
          java/util/HashMap@0x00000000C908B5D8 
        which is owned by: 
      Thread "JetFusionWorker I/O-18" (0x0000000030849B00) 
        which is waiting for: 
          sys_mon_t:0x00000001938CBC08 infl_mon_t: 0x00000001938CBC58: 
          sun/nio/ch/SelectionKeyImpl@0x00000000CED6A908 
        which is owned by: 
      Thread "JetFusionWorker task-444" (0x0000000030716B00)
      
      "JetFusionWorker I/O-18" J9VMThread:0x0000000030849B00, omrthread_t:0x0000000194D5ED10, java/lang/Thread:0x00000000C908A8D0, state:B, prio=6 
          (java/lang/Thread getId:0x1A1, isDaemon:true) 
          (native thread ID:0x8BD00F, native priority:0x6, native policy:UNKNOWN, vmstate:B, vm thread flags:0x00000281) 
          CPU usage total: 14.735246000 secs, user: 10.841000000 secs, system: 3.894246000 secs, current category="Application" 
        Blocked on: sun/nio/ch/SelectionKeyImpl@0x00000000CED6A908 Owned by: "JetFusionWorker task-444" (J9VMThread:0x0000000030716B00, java/lang/Thread:0x00000000C71B8B38) 
        Heap bytes allocated since last GC cycle=0 (0x0)
        Java callstack: 
          at sun/nio/ch/AbstractPollSelectorImpl.swapChannels(AbstractPollSelectorImpl.java:193(Compiled Code)) 
            (entered lock: sun/nio/ch/SelectionKeyImpl@0x00000000C83DB470, entry count: 1) 
          at sun/nio/ch/AbstractPollSelectorImpl.adjustInteresting(AbstractPollSelectorImpl.java:259(Compiled Code)) 
          at sun/nio/ch/AbstractPollSelectorImpl.updateRegistrations(AbstractPollSelectorImpl.java:279(Compiled Code)) 
      ***   (entered lock: java/util/HashMap@0x00000000C908B5D8, entry count: 1) 
          at sun/nio/ch/PollSelectorImpl.doSelect(PollSelectorImpl.java:103(Compiled Code)) 
          at sun/nio/ch/SelectorImpl.lockAndDoSelect(SelectorImpl.java:98(Compiled Code)) 
            (entered lock: sun/nio/ch/Util$3@0x00000000C908A988, entry count: 1) 
            (entered lock: java/util/Collections$UnmodifiableSet@0x00000000C908A998, entry count: 1) (
            entered lock: sun/nio/ch/PollSelectorImpl@0x00000000C908A9A8, entry count: 1) 
          at sun/nio/ch/SelectorImpl.select(SelectorImpl.java:109(Compiled Code)) 
          at org/xnio/nio/WorkerThread.run(WcpHost:551(Compiled Code))
      
      "JetFusionWorker task-444" J9VMThread:0x0000000030716B00, omrthread_t:0x0000000194E89190, java/lang/Thread:0x00000000C71B8B38, state:B, prio=6 
          (java/lang/Thread getId:0xC1A4, isDaemon:true) 
          (native thread ID:0x6AF0EF, native priority:0x6, native policy:UNKNOWN, vmstate:B, vm thread flags:0x00000281) 
          CPU usage total: 0.283622000 secs, user: 0.202706000 secs, system: 0.080916000 secs, current category="Application" 
        Blocked on: java/util/HashMap@0x00000000C908B5D8 Owned by: "JetFusionWorker I/O-18" (J9VMThread:0x0000000030849B00, java/lang/Thread:0x00000000C908A8D0) 
        Heap bytes allocated since last GC cycle=0 (0x0) 
        Java callstack: 
          at sun/nio/ch/AbstractPollSelectorImpl.setInterest(AbstractPollSelectorImpl.java:97(Compiled Code)) 
          at sun/nio/ch/AbstractPollSelectorImpl.putEventOps(AbstractPollSelectorImpl.java:106(Compiled Code)) 
            (entered lock: java/lang/Object@0x00000000C908B5C8, entry count: 1) 
          at sun/nio/ch/SocketChannelImpl.translateAndSetInterestOps(SocketChannelImpl.java:988(Compiled Code)) 
          at sun/nio/ch/SelectionKeyImpl.nioInterestOps(SelectionKeyImpl.java:147(Compiled Code)) 
          at sun/nio/ch/SelectionKeyImpl.interestOps(SelectionKeyImpl.java:95(Compiled Code)) 
          at org/xnio/nio/WorkerThread.setOps(WcpHost:814(Compiled Code)) 
      ***   (entered lock: sun/nio/ch/SelectionKeyImpl@0x00000000CED6A908, entry count: 1) 
          at org/xnio/nio/NioHandle.resume(WcpHost:42(Compiled Code)) 
          at org/xnio/nio/NioSocketConduit.resumeReads(WcpHost:341(Compiled Code)) 
          at org/xnio/conduits/ConduitStreamSourceChannel.resumeReads(WcpHost:135(Compiled Code)) 
          ...

       (sorry about the messy trace; can't seem to tell Jira to suppress word wrapping)

      Attachments

        Activity

          People

            Unassigned Unassigned
            lawrence-dol Lawerence Dol (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: