Uploaded image for project: 'WildFly WIP'
  1. WildFly WIP
  2. WFWIP-27

[Artemis 2.x Upgrade] Failover does not work with HTTP connectors/acceptors

    Details

    • Steps to Reproduce:
      Hide

      At first the test must be unignored by removing the Ignore annotation.

      export DISABLE_TRACE_LOGS=false
      
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/258//artifact/jboss-eap.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=CleanUpOldReplicasTestCase#testMaxReplicatedJournalSize0 -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT -DfailIfNoTests=false -Deap=7x | tee log
      

      If you want to run the test with netty connectors/acceptors, add this parameter -Dprepare.param.CONNECTOR_TYPE=NETTY_NIO

      Show
      At first the test must be unignored by removing the Ignore annotation. export DISABLE_TRACE_LOGS= false git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/258//artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=CleanUpOldReplicasTestCase#testMaxReplicatedJournalSize0 -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1521544306-SNAPSHOT -DfailIfNoTests= false -Deap=7x | tee log If you want to run the test with netty connectors/acceptors, add this parameter -Dprepare.param.CONNECTOR_TYPE=NETTY_NIO

      Description

      Failover does not work with HTTP connectors/acceptors.

      Scenario:

      1. There are two Wildfly servers configured as Live-Backup pair
      2. There is one JMS producer and one JMS receiver which sends/receives messages
      3. Live server is several times killed and restarted.

      Expectation: Always when the Live server is killed or restarted, clients do failover or failback.
      Reality: Sometimes happens that clients don't do failover.

      Users impact: One of basics feature of HA, failover, does not work properly.

      Blocker priority was set because this is regression against previous releases of Wildfly.

      Detail description of issue

      In the trace logs it can be seen that clients send HTTP handshake request to active backup, but the handshake fails. All checks (and logs) say that the backup is active in this time period. I tried to run the test with Netty connectors/acceptors and I didn't see this issue.

      Backup log
      07:52:47,856 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f01f347)) AMQ222029: Could not locate page transaction 3 995, ignoring message on position PagePositionImpl [pageNr=2, messageNr=55, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
      07:52:47,856 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f01f347)) AMQ222029: Could not locate page transaction 3 995, ignoring message on position PagePositionImpl [pageNr=2, messageNr=56, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
      07:52:47,863 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485068, b=2147485067]
      07:52:47,863 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485076, b=2147485075]
      07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147484450, b=2147484449]
      07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147485072, b=2147485071]
      07:52:47,864 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221005: Deleting pending large message as it was not completed: Pair[a=2147484454, b=2147484453]
      07:52:47,866 WARN  [org.apache.activemq.artemis.core.client] (activemq-discovery-group-thread-dg-group1) AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=0dee81c9-2d9d-11e8-ba3f-cc3d825f79a4
      07:52:47,867 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ221007: Server is now live
      
      Clients log
      07:52:49,274 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote destination: localhost/127.0.0.1:9080
      07:52:49,274 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector towards NettyConnector [host=localhost, port=8080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] failed
      07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1081] Trying backup config = TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=9080&host=localhost
      07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:447] Connector NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] using native epoll
      07:52:49,275 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
      GET  HTTP/1.1
      host: localhost
      upgrade: activemq-remoting
      connection: upgrade
      activemqServerName: default
      httpUpgradeEndpoint: acceptor
      Sec-ActiveMQRemoting-Key: QWV3bCfgh75NjWH3pZV5Ew==
      07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:601] AMQ211002: Started EPOLL Netty Connector version 4.1.16.Final to localhost:9080
      07:52:49,275 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:726] Remote destination: localhost/127.0.0.1:9080
      07:52:49,276 Thread-79 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector:797] Sending HTTP request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
      GET  HTTP/1.1
      host: localhost
      upgrade: activemq-remoting
      connection: upgrade
      activemqServerName: default
      httpUpgradeEndpoint: acceptor
      Sec-ActiveMQRemoting-Key: P9xBwRk1eZP5QjDWjqYuIg==
      07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876] Received msg=DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
      HTTP/1.1 200 OK
      Connection: keep-alive
      Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
      Content-Length: 2426
      Content-Type: text/html
      Accept-Ranges: bytes
      Date: Thu, 22 Mar 2018 06:52:49 GMT
      07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903] AMQ214023: HTTP Handshake failed, received DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
      HTTP/1.1 200 OK
      Connection: keep-alive
      Last-Modified: Thu, 22 Mar 2018 06:47:03 GMT
      Content-Length: 2426
      Content-Type: text/html
      Accept-Ranges: bytes
      Date: Thu, 22 Mar 2018 06:52:49 GMT
      07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:876] Received msg=DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
      07:52:49,276 Thread-2 (ActiveMQ-client-netty-threads) DEBUG [org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$HttpUpgradeHandler:903] AMQ214023: HTTP Handshake failed, received DefaultHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 829, cap: 829/829, unwrapped: PooledUnsafeDirectByteBuf(ridx: 1024, widx: 1024, cap: 1024)), decoderResult: success)
      07:52:49,277 Thread-80 (ActiveMQ-client-global-threads) DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl:1040] Connector towards NettyConnector [host=localhost, port=9080, httpEnabled=false, httpUpgradeEnabled=true, useServlet=false, servletPath=/messaging/ActiveMQServlet, sslEnabled=false, useNio=true, activemqServerName=default, httpUpgradeEndpoint=acceptor] failed
      

        Gliffy Diagrams

          Attachments

          1. clients.log
            8.31 MB
          2. server1.log
            117 kB
          3. server2.log
            104 kB
          4. standalone-full-ha.xml
            40 kB
          5. standalone-full-ha.xml
            40 kB
          6. test.log
            719 kB

            Issue Links

              Activity

                People

                • Assignee:
                  jmesnil Jeff Mesnil
                  Reporter:
                  eduda Erich Duda
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: