Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-4805

Distributable app causes TransferQueueBundler Invalid argument and Topology errors

    XMLWordPrintable

Details

    • Hide
      1. Unzip 3 vanilla EAP 7 DR4 instances on a single box
      2. Configure a common non-localhost interface in standalone-ha.xml
      3. Configure instance-id for Undertow and set jboss.mod_cluster.jvmRoute property, each EAP instance has a different one
      4. Set port offset
      5. Copy clusterbench.war into standalone/deployments
      6. Start all 3 instances
      7. Play with
        curl http://${ADDRESS}:${PORT}/clusterbench/session -b cookie.txt -c cookie.txt

        and/or

        curl http://${ADDRESS}:${PORT}/clusterbench/requestinfo -b cookie.txt -c cookie.txt
      8. Shut down and start again nodes one by one. Be nice, don't send request to nodes that are in the middle of startup/shutdown
      9. Observe errors reported in log and occasional HTTP 503 error on the client side
      Show
      Unzip 3 vanilla EAP 7 DR4 instances on a single box Configure a common non-localhost interface in standalone-ha.xml Configure instance-id for Undertow and set jboss.mod_cluster.jvmRoute property, each EAP instance has a different one Set port offset Copy clusterbench.war into standalone/deployments Start all 3 instances Play with curl http://${ADDRESS}:${PORT}/clusterbench/session -b cookie.txt -c cookie.txt and/or curl http://${ADDRESS}:${PORT}/clusterbench/requestinfo -b cookie.txt -c cookie.txt Shut down and start again nodes one by one. Be nice, don't send request to nodes that are in the middle of startup/shutdown Observe errors reported in log and occasional HTTP 503 error on the client side

    Description

      Simple shutdown failover tests are failing due to apparent error in / misconfiguration of the distributed cache.
      If one follows the aforementioned steps to reproduce, the following symptoms appear:

      • Variations on
        SEVERE [org.jgroups.protocols.UDP] (TransferQueueBundler,ee,rhel7gax86-64) JGRP000029: rhel7gax86-64: failed sending message to rhel7gax86-64 (59 bytes): java.io.IOException: Invalid argument, headers: UNICAST3: ACK, seqno=9410, ts=217, UDP: [cluster_name=ee]
      • Failures of this kind:
        ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (transport-thread--p2-t12) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 11
            at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:92)
            at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
            at org.infinispan.statetransfer.StateTransferInterceptor.handleTxWriteCommand(StateTransferInterceptor.java:285)
            at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:254)
            at org.infinispan.statetransfer.StateTransferInterceptor.visitRemoveCommand(StateTransferInterceptor.java:130)
            at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:58)
            at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:97)
            at org.infinispan.interceptors.CacheMgmtInterceptor.visitRemoveCommand(CacheMgmtInterceptor.java:209)
            at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:58)
            at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:97)
            at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
            at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
            at org.infinispan.commands.AbstractVisitor.visitRemoveCommand(AbstractVisitor.java:49)
            at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:58)
            at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:336)
            at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1617)
            at org.infinispan.cache.impl.CacheImpl.removeInternal(CacheImpl.java:579)
            at org.infinispan.cache.impl.CacheImpl.remove(CacheImpl.java:572)
            at org.infinispan.cache.impl.DecoratedCache.remove(DecoratedCache.java:442)
            at org.infinispan.cache.impl.AbstractDelegatingCache.remove(AbstractDelegatingCache.java:297)
            at org.wildfly.clustering.server.registry.CacheRegistry.topologyChanged(CacheRegistry.java:152)
            at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl$1.run(AbstractListenerImpl.java:286)
            at org.infinispan.util.concurrent.WithinThreadExecutor.execute(WithinThreadExecutor.java:22)
            at org.infinispan.notifications.impl.AbstractListenerImpl$ListenerInvocationImpl.invoke(AbstractListenerImpl.java:309)
            at org.infinispan.notifications.cachelistener.CacheNotifierImpl$BaseCacheEntryListenerInvocation.doRealInvocation(CacheNotifierImpl.java:1180)
            at org.infinispan.notifications.cachelistener.CacheNotifierImpl$BaseCacheEntryListenerInvocation.invoke(CacheNotifierImpl.java:1139)
            at org.infinispan.notifications.cachelistener.CacheNotifierImpl$BaseCacheEntryListenerInvocation.invoke(CacheNotifierImpl.java:1105)
            at org.infinispan.notifications.cachelistener.CacheNotifierImpl.notifyTopologyChanged(CacheNotifierImpl.java:560)
            at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:201)
            at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:45)
            at org.infinispan.statetransfer.StateTransferManagerImpl$1.updateConsistentHash(StateTransferManagerImpl.java:113)
            at org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:285)
            at org.infinispan.topology.LocalTopologyManagerImpl$1.run(LocalTopologyManagerImpl.java:218)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at org.infinispan.executors.SemaphoreCompletionService$QueueingTask.runInternal(SemaphoreCompletionService.java:166)
            at org.infinispan.executors.SemaphoreCompletionService$QueueingTask.run(SemaphoreCompletionService.java:144)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
      • Warnings such as:
        WARN  [org.infinispan.transaction.impl.TransactionTable] (TxCleanupService,dist,rhel7gax86-64) ISPN000326: Remote transaction GlobalTransaction:<rhel7gax86-64>:9373:remote timed out. Rolling back after 70810 ms

      Logs from one of such play & test scenarios are attached: logs.zip, configs.zip.

      Any ideas which configuration directive or application setting might be at the bottom of this?
      Needless to say any such test passes without any error with EAP 6.4 and 6.3 both with shutdown and undeploy failover scenarios.

      Thx for comments.

      Attachments

        1. clusterbench.war
          392 kB
        2. configs.zip
          14 kB
        3. logs.zip
          35 kB

        Issue Links

          Activity

            People

              rpelisse@redhat.com Romain Pelisse
              mbabacek1@redhat.com Michal Karm
              Michal Karm Michal Karm
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: