Uploaded image for project: 'Hawkular Metrics'
  1. Hawkular Metrics
  2. HWKMETRICS-797

Preparing statements for temp tables can fail with multi-node Cassandra cluster

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 0.30.0
    • Fix Version/s: 0.31.0, 0.30.7
    • Component/s: Core
    • Labels:
      None

      Description

      The TempTableCreator job creates new tables every two hours. The job however does not handle creating prepared statements for the new tables since there is only a single instance of the job running and since there can be multiple replicas of hawkular-metrics. Instead, DataAccessImpl registers a SchemaChangeListener with the Cassandra driver. The listener receives notifications when tables are created and it then creates the prepared statements.

      There is a bit of a race condition when running multiple C* nodes. The listener can get notified before the schema changes have finished propagating across the cluster. This can result in an error like:

      DEBUG 2018-07-20 15:59:17,167 [cluster1-worker-2] org.hawkular.metrics.core.service.DataAccessImpl$TemporaryTableStatementCreator:onTableAdded:1371 - Registering prepared statements for table data_temp_2018072110
      Exception in thread "RxIoScheduler-2" java.lang.IllegalStateException: Exception thrown on Scheduler.Worker thread. Add `onError` handling.
      	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:57)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: rx.exceptions.OnErrorNotImplementedException: unconfigured table data_temp_2018072110
      	at rx.internal.util.InternalObservableUtils$ErrorNotImplementedAction.call(InternalObservableUtils.java:386)
      	at rx.internal.util.InternalObservableUtils$ErrorNotImplementedAction.call(InternalObservableUtils.java:383)
      	at rx.internal.util.ActionSubscriber.onError(ActionSubscriber.java:44)
      	at rx.observers.SafeSubscriber._onError(SafeSubscriber.java:153)
      	at rx.observers.SafeSubscriber.onError(SafeSubscriber.java:115)
      	at rx.internal.operators.OperatorSubscribeOn$SubscribeOnSubscriber.onError(OperatorSubscribeOn.java:80)
      	at rx.exceptions.Exceptions.throwOrReport(Exceptions.java:212)
      	at rx.internal.operators.OnSubscribeFromCallable.call(OnSubscribeFromCallable.java:50)
      	at rx.internal.operators.OnSubscribeFromCallable.call(OnSubscribeFromCallable.java:33)
      	at rx.Observable.unsafeSubscribe(Observable.java:10256)
      	at rx.internal.operators.OperatorSubscribeOn$SubscribeOnSubscriber.call(OperatorSubscribeOn.java:100)
      	at rx.internal.schedulers.CachedThreadScheduler$EventLoopWorker$1.call(CachedThreadScheduler.java:230)
      	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
      	... 7 more
      Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table data_temp_2018072110
      	at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
      	at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
      	at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:104)
      	at org.hawkular.metrics.core.service.DataAccessImpl.prepareTempStatements(DataAccessImpl.java:392)
      	at org.hawkular.metrics.core.service.TestDataAccessFactory$1.prepareTempStatements(TestDataAccessFactory.java:53)
      	at org.hawkular.metrics.core.service.DataAccessImpl$TemporaryTableStatementCreator.lambda$onTableAdded$0(DataAccessImpl.java:1373)
      	at rx.internal.operators.OnSubscribeFromCallable.call(OnSubscribeFromCallable.java:48)
      	... 12 more
      Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table data_temp_2018072110
      	at com.datastax.driver.core.Responses$Error.asException(Responses.java:148)
      	at com.datastax.driver.core.SessionManager$4.apply(SessionManager.java:220)
      	at com.datastax.driver.core.SessionManager$4.apply(SessionManager.java:196)
      	at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:906)
      	at com.google.common.util.concurrent.Futures$1$1.run(Futures.java:635)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
      	... 1 more
      

      I have recently seen some write timeouts in some OpenShift clusters running with very small loads. I think those timeouts are due to this bug. Let's say we have two nodes, C1 and C2. C1 receives a request that is a batch statement of inserts. C2 forwards the request to C2. C2 has to prepare the statements. I think that the extra overhead of executed non-prepared statements might be causing C1 to timeout.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  john.sanda John Sanda
                  Reporter:
                  john.sanda John Sanda
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: