Uploaded image for project: 'JBoss Enterprise Application Platform 4 and 5'
  1. JBoss Enterprise Application Platform 4 and 5
  2. JBPAPP-7004

Deadlock when using netty NIO acceptor

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • EAP_EWP 5.1.2
    • None
    • HornetQ
    • None

    Description

      I've been trying to use netty acceptors with NIO, rather than blocking IO. HornetQ doesn't seem to work well in this configuration, freezing up after running for only a very short while.

      Acceptor configuration:

      <acceptor name="netty">
          <factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>
          <param key="host"  value="${hornetq.remoting.netty.host:localhost}"/>
          <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>
          <param key="batch-delay" value="50"/>
          <param key="use-nio" value="true" />
      </acceptor>
      

      A test case is attached that reproduces the issue with high probability. The test has 30 producers sending 3kiB ByteMessages to queue1 via connection 1. 30 consumers pull from queue1 via connection 2, and send again to queue2. 30 more consumers consume from queue2 via connection 3. Everything is done via the JMS API, in JMS transactions.

      The server freezes after a very short time running (almost immediately). I have to kill it with kill -9. With a profiler, I see a deadlock:

      New I/O server worker #1-2 [BLOCKED; waiting to lock java.lang.Object@904497]
      org.hornetq.core.server.impl.ServerConsumerImpl.promptDelivery(ServerConsumerImpl.java:664)
      org.hornetq.core.server.impl.ServerConsumerImpl.readyForWriting(ServerConsumerImpl.java:642)
      org.hornetq.core.remoting.impl.netty.NettyConnection.fireReady(NettyConnection.java:264)
      org.hornetq.core.remoting.impl.netty.NettyAcceptor$Listener.connectionReadyForWrites(NettyAcceptor.java:695)
      org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.channelInterestChanged(HornetQChannelHandler.java:65)
      org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:136)
      org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)
      org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)
      org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelInterestChanged(SimpleChannelUpstreamHandler.java:183)
      org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:116)
      org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)
      org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)
      org.jboss.netty.channel.Channels.fireChannelInterestChanged(Channels.java:335)
      org.jboss.netty.channel.socket.nio.NioSocketChannel$WriteRequestQueue.poll(NioSocketChannel.java:242)
      org.jboss.netty.channel.socket.nio.NioSocketChannel$WriteRequestQueue.poll(NioSocketChannel.java:197)
      org.jboss.netty.channel.socket.nio.NioWorker.write0(NioWorker.java:455)
      org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:388)
      org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:137)
      org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
      org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendDownstream(StaticChannelPipeline.java:502)
      org.jboss.netty.channel.SimpleChannelHandler.writeRequested(SimpleChannelHandler.java:304)
      org.jboss.netty.channel.SimpleChannelHandler.handleDownstream(SimpleChannelHandler.java:266)
      org.jboss.netty.channel.StaticChannelPipeline.sendDownstream(StaticChannelPipeline.java:385)
      org.jboss.netty.channel.StaticChannelPipeline.sendDownstream(StaticChannelPipeline.java:380)
      org.jboss.netty.channel.Channels.write(Channels.java:611)
      org.jboss.netty.channel.Channels.write(Channels.java:578)
      org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:259)
      org.hornetq.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:211)
      org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:199)
      org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:142)
      org.hornetq.core.protocol.core.impl.CoreSessionCallback.sendProducerCreditsMessage(CoreSessionCallback.java:87)
      org.hornetq.core.server.impl.ServerSessionImpl$2.run(ServerSessionImpl.java:1151)
      org.hornetq.core.paging.impl.PagingStoreImpl.executeRunnableWhenMemoryAvailable(PagingStoreImpl.java:741)
      org.hornetq.core.server.impl.ServerSessionImpl.requestProducerCredits(ServerSessionImpl.java:1147)
      org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:473)
      org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474)
      org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496)
      org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457)
      org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459)
      org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73)
      org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100)
      org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)
      org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)
      org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287)
      org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)
      org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)
      org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
      org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)
      org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)
      org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
      org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
      org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
      org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
      org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
      org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
      org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
      org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      java.lang.Thread.run(Thread.java:636)

      Thread-5 (group:HornetQ-server-threads12254719-18227730) [BLOCKED; waiting to lock java.lang.Object@c7d960]
      org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:161)
      org.hornetq.core.protocol.core.impl.ChannelImpl.sendBatched(ChannelImpl.java:147)
      org.hornetq.core.protocol.core.impl.CoreSessionCallback.sendMessage(CoreSessionCallback.java:76)
      org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:704)
      org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:291)
      org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:2017)
      org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1587)
      org.hornetq.core.server.impl.QueueImpl.doPoll(QueueImpl.java:1472)
      org.hornetq.core.server.impl.QueueImpl.access$1100(QueueImpl.java:72)
      org.hornetq.core.server.impl.QueueImpl$ConcurrentPoller.run(QueueImpl.java:2299)
      org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      java.lang.Thread.run(Thread.java:636)

      There are also other manifestations of the deadlock, but all involve ServerConsumerImpl.promptDelivery. See http://community.jboss.org/message/617780.

      This is my analysis of the situation:

      Thread-5

      • passes through ServerConsumerImpl.handle(..), acquiring the "lock" lock object.
      • Just a few calls later, it also calls ChannelImpl.send(..), where it tries to acquire the "sendLock" lock object (in a synchronized block).

      NIO worker 1-2 (running concurrently)

      • invokes CoreSessionCallback.sendProducerCreditsMessage(..)
      • which leads to a call to ChannelImpl.send(..), where it ackquires the "sendLock".
      • This leads to a netty channel write request, which in this case it is processed immediately, leading to the request passing downstream through the netty pipeline (synchronously!).
      • After writing, a netty event (channelInterestChanged) is sent back upstream, and handled by HornetQChannelHandler.
      • This leads to a call to ServerConsumerImpl.promptDelivery(), which tries to synchronize on the "lock" object, which is already being held by thread-5.
      • Thread-5 is in turn still waiting on sendLock, already acquired by NIO worker 1-2, leading to the deadock.

      My train of though was then:

      • In promptDelivery(), there is a synchronized block on the "lock" object. If this lock could be eliminated, then thread B would acquire only one of the two locks, and the deadlock is avoided.
      • In promptDelivery(), it also seems from code comments that the reason for the synchronized block, is to protect the state integrity of largeMessageDeliverer.
      • So, I thought that it might just work to use a different lock object, dedicated to protecting largeMessageDeliverer. This way, promptDelivery() doesn't have to synchronize on the the "lock" object, so there's no deadlock.

      A patch that uses a separate lock for largeMessageDeliverer is attached. I didn't test the effects on large messages, because I don't know how to do it properly.

      Attachments

        Activity

          People

            csuconic@redhat.com Clebert Suconic
            carl.heymann Carl Heymann (Inactive)
            Jared Morgan Jared Morgan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: