Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-13404

Failing Netty Transport Upgrade in JBoss EAP, Resulting in Blocked JGroups Discovery Threads and OutOfMemoryErrors

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Rejected
    • Affects Version/s: 7.0.z.GA
    • Fix Version/s: None
    • Component/s: ActiveMQ
    • Labels:
    • Target Release:
    • Steps to Reproduce:
      Hide

      Reproducer in progress

      Show
      Reproducer in progress
    • Release Notes Text:
      Issue was due to a missing configuration property "ssl-enabled" on the connector. Closing.

      Description

      In a multinode JBoss EAP / Artemis cluster (greater than two nodes), after a cluster update, the jgroups discovery threads become blocked waiting for a lock held by the ActiveMQ-server threads, which are repeatedly trying to upgrade the netty connection and failing. JGroups messages pile up in the receiver, with the eventual result that the container goes OutOfMemory. [edit] This happens quickly in clustered configurations with more than one server per host controller, but can also happen in clusters greater than two nodes with one server per controller.[/edit] It is not detected as a deadlock, as the server thread is in a timed wait, but the thread holds the lock repeatedly as long as the connection upgrade is failing. Relevant stacks look like:

      "Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@d85ed25-1490599274)" #154 prio=5 os_prio=0 tid=0x00007f1c6469e000 nid=0x3a5e waiting on condition [0x00007f1c1fa63000]
         java.lang.Thread.State: TIMED_WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000007862028c0> (a java.util.concurrent.CountDownLatch$Sync)
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
      	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
      	at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler.awaitHandshake(NettyConnector.java:765)
      	at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:664)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1009)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1051)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1230)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:867)
      	- locked <0x000000078074b620> (a java.lang.Object)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:769)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.connect(ClientSessionFactoryImpl.java:238)
      	at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:760)
      	- locked <0x0000000772f8f010> (a org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl)
      	at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:617)
      	at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:598)
      	at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl$4.run(ServerLocatorImpl.java:562)
      	at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:103)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- <0x0000000772f92810> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      "activemq-discovery-group-thread-dg-group1" #153 daemon prio=5 os_prio=0 tid=0x00007f1c6469c000 nid=0x3a5d waiting for monitor entry [0x00007f1c1fb65000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connectorsChanged(ServerLocatorImpl.java:1431)
      	- waiting to lock <0x0000000772f8f010> (a org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl)
      	at org.apache.activemq.artemis.core.cluster.DiscoveryGroup.callListeners(DiscoveryGroup.java:355)
      	at org.apache.activemq.artemis.core.cluster.DiscoveryGroup.access$500(DiscoveryGroup.java:49)
      	at org.apache.activemq.artemis.core.cluster.DiscoveryGroup$DiscoveryRunnable.run(DiscoveryGroup.java:323)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- None
      

      It is unclear why the connection upgrade fails in this configuration. I was suspicious of the miss-spelled "httpPpgradeEndpoint" constant used in a header, but it appears to be this way in both Artemis and Wildfly related code blocks.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                hawkinsds Duane Hawkins
                Reporter:
                hawkinsds Duane Hawkins
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: