Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-15352

NullPointerException in case of failover brings to HR Client hanging

    XMLWordPrintable

Details

    Description

      I have failover scenario where the 490k entries load starts on site1, then in 10 seconds the whole site1 cluster is killed. So the client which was loading entries to site1 is switched automatically to site2, and in the most cases the execution ends up successfully. But sometimes (I have seen this behavior 4 times from 10 runs), at the moment when site1 is killed, the client gives the following exception:

      03:33:40,348 ERROR [org.infinispan.HOTROD] (HotRod-client-async-pool-1-1) Failed to complete PutOperation{api-general-filestore-without-expiration, key=[B0x4A146B65795F30303030303030303030..[22], value=[B0x4A5276616C75655F6B65795F30303030..[84], flags=6, connection=10.16.90.106/10.16.90.106:11222}
      java.lang.NullPointerException: null
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.executeDirectlyIfPossible(ChannelPool.java:113) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.acquire(ChannelPool.java:79) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:328) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:307) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:321) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:306) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:282) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.operations.AbstractKeyOperation.fetchChannelAndInvoke(AbstractKeyOperation.java:47) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.retryIfNotDone(RetryOnFailureOperation.java:101) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.logAndRetryOrFail(RetryOnFailureOperation.java:208) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.channelInactive(RetryOnFailureOperation.java:136) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.channelInactive(HeaderDecoder.java:232) [infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) [netty-handler-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) [netty-transport-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) [netty-common-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) [netty-common-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) [netty-common-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) [netty-transport-classes-epoll-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.100.Final.jar:4.1.100.Final]
      	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.100.Final.jar:4.1.100.Final]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      	at java.lang.Thread.run(Thread.java:829) [?:?]
      

      and then the the entries load process hangs. I have a thread dump from the client for this , from where I found this part:

      Thread dump result on client:
      376 "Loader-1" #58 prio=5 os_prio=0 cpu=314.65ms elapsed=7391.68s tid=0x00007faeb03f9800 nid=0x272 waiting on condition  [0x00007fae533f2000]
      377    java.lang.Thread.State: TIMED_WAITING (parking)
      378         at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method)
      379         - parking to wait for  <0x000000060d630c78> (a java.util.concurrent.CompletableFuture$Signaller)
      380         at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.12/LockSupport.java:234)
      381         at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.12/CompletableFuture.java:1798)
      382         at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.12/ForkJoinPool.java:3128)
      383         at java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.12/CompletableFuture.java:1868)
      384         at java.util.concurrent.CompletableFuture.get(java.base@11.0.12/CompletableFuture.java:2021)
      385         at org.infinispan.client.hotrod.impl.Util.await(Util.java:52)
      386         at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:196)
      387         at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:186)
      388         at org.radargun.service.HotRodOperations$HotRodCache.put(HotRodOperations.java:83)
      389         at org.radargun.stages.cache.test.LoadStage$NonTxLoader.loadDataUnit(LoadStage.java:275)
      390         at org.radargun.stages.test.LoadStage$Loader.run(LoadStage.java:125)
      
      

      The cache configuration used for this scenario is given below and the issue appears for both SYNC/ASYNC backup strategies:

      <distributed-cache name="api-general-filestore-without-expiration" owners="2" segments="30" mode="SYNC" remote-timeout="25000" start="EAGER">
              <backups>
                  <backup site="site02" strategy="ASYNC" timeout="30000">
                      <take-offline after-failures="-1" min-wait="60000"/>
                  </backup>
              </backups>
          </distributed-cache>
      

      Please note, the scenario is executed under JDK11 as it is running on our perf-lab. At the moment there is no possibility to run the scenario with JDK17 to have more concrete message on exception.

      Attachments

        Activity

          People

            rh-ee-jbolina Jose Bolina
            amanukya@redhat.com Anna Manukyan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: