Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-1153

Scaling Kafka logs and exception and doesn't work properly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 1.4.0.GA
    • 1.3.0.GA
    • cluster-operator
    • None
    • 0
    • 2019.9, 2019.10, 2019.12, 2019.13, 2019.14

    Description

      As reported by jstejska, when you try to scale Kafka cluster, you can often see following errors (especially if it uses external listener) and does rolling update before the scaleup:

      2019-07-12 11:25:37 INFO  KafkaSetOperator:75 - Scaling up to 4 replicas
      2019-07-12 11:25:37 ERROR StatefulSetOperationsImpl:126 - 0/4 pod(s) ready for StatefulSet: my-cluster-kafka in namespace: kafka-cluster-test  after waiting for 0 seconds so giving up
      2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Endpoints resource my-cluster-kafka-bootstrap in namespace kafka-cluster-test to get ready
      2019-07-12 11:25:37 ERROR StatefulSetOperationsImpl:115 - Error while waiting for resource to be scaled.
      io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [StatefulSet]  with name: [my-cluster-kafka]  in namespace: [kafka-cluster-test]  failed.
              at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:161) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.internal.RollableScalableResourceOperation.lambda$waitUntilScaled$0(RollableScalableResourceOperation.java:94) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212]
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_212]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_212]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_212]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
      Caused by: java.io.InterruptedIOException: interrupted
              at okio.Timeout.throwIfReached(Timeout.java:146) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.Okio$2.read(Okio.java:137) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$1(HttpClientUtils.java:100) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.RealCall.execute(RealCall.java:92) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:330) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:311) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:749) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              ... 9 more
      2019-07-12 11:25:37 DEBUG Util:52 - Endpoints resource my-cluster-kafka-bootstrap in namespace kafka-cluster-test is ready
      2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Endpoints resource my-cluster-kafka-brokers in namespace kafka-cluster-test to get ready
      2019-07-12 11:25:37 DEBUG Util:52 - Endpoints resource my-cluster-kafka-brokers in namespace kafka-cluster-test is ready
      2019-07-12 11:25:37 DEBUG ServiceAccountOperator:107 - ServiceAccount kafka-cluster-test/my-cluster-topic-operator does not exist, noop
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:107 - RoleBinding kafka-cluster-test/strimzi-my-cluster-topic-operator does not exist, noop
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:107 - ConfigMap kafka-cluster-test/my-cluster-topic-operator-config does not exist, noop
      2019-07-12 11:25:37 DEBUG SecretOperator:107 - Secret kafka-cluster-test/my-cluster-topic-operator-certs does not exist, noop
      2019-07-12 11:25:37 DEBUG DeploymentOperator:107 - Deployment kafka-cluster-test/my-cluster-topic-operator does not exist, noop
      2019-07-12 11:25:37 DEBUG ServiceAccountOperator:98 - ServiceAccount kafka-cluster-test/my-cluster-entity-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:98 - RoleBinding kafka-cluster-test/strimzi-my-cluster-entity-topic-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:164 - RoleBinding strimzi-my-cluster-entity-topic-operator in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:98 - RoleBinding kafka-cluster-test/strimzi-my-cluster-entity-user-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:164 - RoleBinding strimzi-my-cluster-entity-user-operator in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:98 - ConfigMap kafka-cluster-test/my-cluster-entity-topic-operator-config already exists, patching it
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:47 - ConfigMap my-cluster-entity-topic-operator-config in namespace kafka-cluster-test has not been patched because resources are equal
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:98 - ConfigMap kafka-cluster-test/my-cluster-entity-user-operator-config already exists, patching it
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:47 - ConfigMap my-cluster-entity-user-operator-config in namespace kafka-cluster-test has not been patched because resources are equal
      2019-07-12 11:25:37 DEBUG SecretOperator:98 - Secret kafka-cluster-test/my-cluster-entity-operator-certs already exists, patching it
      2019-07-12 11:25:37 DEBUG SecretOperator:164 - Secret my-cluster-entity-operator-certs in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG DeploymentOperator:98 - Deployment kafka-cluster-test/my-cluster-entity-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG DeploymentOperator:164 - Deployment my-cluster-entity-operator in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Deployment resource my-cluster-entity-operator in namespace kafka-cluster-test to get ready
      2019-07-12 11:25:37 DEBUG Util:52 - Deployment resource my-cluster-entity-operator in namespace kafka-cluster-test is ready
      2019-07-12 11:25:37 DEBUG StatusDiff:37 - Ignoring Status diff {"op":"replace","path":"/conditions/0/lastTransitionTime","value":"2019-07-12T11:25:37+0000"}
      2019-07-12 11:25:37 DEBUG StatusDiff:42 - Status differs: {"op":"replace","path":"/observedGeneration","value":2}
      2019-07-12 11:25:37 DEBUG StatusDiff:43 - Current Status path /observedGeneration has value 1
      2019-07-12 11:25:37 DEBUG StatusDiff:44 - Desired Status path /observedGeneration has value 2
      2019-07-12 11:25:37 DEBUG KafkaAssemblyOperator:394 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Completed status update
      2019-07-12 11:25:37 DEBUG KafkaAssemblyOperator:206 - Status for my-cluster is up to date
      2019-07-12 11:25:37 DEBUG AbstractAssemblyOperator:188 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Lock lock::kafka-cluster-test::Kafka::my-cluster released
      2019-07-12 11:25:37 INFO  AbstractAssemblyOperator:317 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Assembly reconciled
      

      There are two problems:

      • An exception from Fabric8 is printed to the log (it is not thrown to use, is is caught inside Fabric8 and logged)
      • The scale() method should wait for the scaling but it doesn't. And we do not seem to do any further check for readiness - so we pretty much finish the reconciliation without the new broker being ready which is wrong.

      Attachments

        Activity

          People

            jstejska@redhat.com Jakub Stejskal
            jstejska@redhat.com Jakub Stejskal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: