Details
-
Bug
-
Resolution: Done
-
Major
-
1.3.0.GA
-
None
Description
As reported by jstejska, when you try to scale Kafka cluster, you can often see following errors (especially if it uses external listener) and does rolling update before the scaleup:
2019-07-12 11:25:37 INFO KafkaSetOperator:75 - Scaling up to 4 replicas 2019-07-12 11:25:37 ERROR StatefulSetOperationsImpl:126 - 0/4 pod(s) ready for StatefulSet: my-cluster-kafka in namespace: kafka-cluster-test after waiting for 0 seconds so giving up 2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Endpoints resource my-cluster-kafka-bootstrap in namespace kafka-cluster-test to get ready 2019-07-12 11:25:37 ERROR StatefulSetOperationsImpl:115 - Error while waiting for resource to be scaled. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [StatefulSet] with name: [my-cluster-kafka] in namespace: [kafka-cluster-test] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:161) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.internal.RollableScalableResourceOperation.lambda$waitUntilScaled$0(RollableScalableResourceOperation.java:94) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_212] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_212] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_212] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212] Caused by: java.io.InterruptedIOException: interrupted at okio.Timeout.throwIfReached(Timeout.java:146) ~[com.squareup.okio.okio-1.15.0.jar:?] at okio.Okio$2.read(Okio.java:137) ~[com.squareup.okio.okio-1.15.0.jar:?] at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[com.squareup.okio.okio-1.15.0.jar:?] at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354) ~[com.squareup.okio.okio-1.15.0.jar:?] at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226) ~[com.squareup.okio.okio-1.15.0.jar:?] at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$1(HttpClientUtils.java:100) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at okhttp3.RealCall.execute(RealCall.java:92) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:330) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:311) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:749) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217) ~[io.fabric8.kubernetes-client-4.3.0.jar:?] ... 9 more 2019-07-12 11:25:37 DEBUG Util:52 - Endpoints resource my-cluster-kafka-bootstrap in namespace kafka-cluster-test is ready 2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Endpoints resource my-cluster-kafka-brokers in namespace kafka-cluster-test to get ready 2019-07-12 11:25:37 DEBUG Util:52 - Endpoints resource my-cluster-kafka-brokers in namespace kafka-cluster-test is ready 2019-07-12 11:25:37 DEBUG ServiceAccountOperator:107 - ServiceAccount kafka-cluster-test/my-cluster-topic-operator does not exist, noop 2019-07-12 11:25:37 DEBUG RoleBindingOperator:107 - RoleBinding kafka-cluster-test/strimzi-my-cluster-topic-operator does not exist, noop 2019-07-12 11:25:37 DEBUG ConfigMapOperator:107 - ConfigMap kafka-cluster-test/my-cluster-topic-operator-config does not exist, noop 2019-07-12 11:25:37 DEBUG SecretOperator:107 - Secret kafka-cluster-test/my-cluster-topic-operator-certs does not exist, noop 2019-07-12 11:25:37 DEBUG DeploymentOperator:107 - Deployment kafka-cluster-test/my-cluster-topic-operator does not exist, noop 2019-07-12 11:25:37 DEBUG ServiceAccountOperator:98 - ServiceAccount kafka-cluster-test/my-cluster-entity-operator already exists, patching it 2019-07-12 11:25:37 DEBUG RoleBindingOperator:98 - RoleBinding kafka-cluster-test/strimzi-my-cluster-entity-topic-operator already exists, patching it 2019-07-12 11:25:37 DEBUG RoleBindingOperator:164 - RoleBinding strimzi-my-cluster-entity-topic-operator in namespace kafka-cluster-test has been patched 2019-07-12 11:25:37 DEBUG RoleBindingOperator:98 - RoleBinding kafka-cluster-test/strimzi-my-cluster-entity-user-operator already exists, patching it 2019-07-12 11:25:37 DEBUG RoleBindingOperator:164 - RoleBinding strimzi-my-cluster-entity-user-operator in namespace kafka-cluster-test has been patched 2019-07-12 11:25:37 DEBUG ConfigMapOperator:98 - ConfigMap kafka-cluster-test/my-cluster-entity-topic-operator-config already exists, patching it 2019-07-12 11:25:37 DEBUG ConfigMapOperator:47 - ConfigMap my-cluster-entity-topic-operator-config in namespace kafka-cluster-test has not been patched because resources are equal 2019-07-12 11:25:37 DEBUG ConfigMapOperator:98 - ConfigMap kafka-cluster-test/my-cluster-entity-user-operator-config already exists, patching it 2019-07-12 11:25:37 DEBUG ConfigMapOperator:47 - ConfigMap my-cluster-entity-user-operator-config in namespace kafka-cluster-test has not been patched because resources are equal 2019-07-12 11:25:37 DEBUG SecretOperator:98 - Secret kafka-cluster-test/my-cluster-entity-operator-certs already exists, patching it 2019-07-12 11:25:37 DEBUG SecretOperator:164 - Secret my-cluster-entity-operator-certs in namespace kafka-cluster-test has been patched 2019-07-12 11:25:37 DEBUG DeploymentOperator:98 - Deployment kafka-cluster-test/my-cluster-entity-operator already exists, patching it 2019-07-12 11:25:37 DEBUG DeploymentOperator:164 - Deployment my-cluster-entity-operator in namespace kafka-cluster-test has been patched 2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Deployment resource my-cluster-entity-operator in namespace kafka-cluster-test to get ready 2019-07-12 11:25:37 DEBUG Util:52 - Deployment resource my-cluster-entity-operator in namespace kafka-cluster-test is ready 2019-07-12 11:25:37 DEBUG StatusDiff:37 - Ignoring Status diff {"op":"replace","path":"/conditions/0/lastTransitionTime","value":"2019-07-12T11:25:37+0000"} 2019-07-12 11:25:37 DEBUG StatusDiff:42 - Status differs: {"op":"replace","path":"/observedGeneration","value":2} 2019-07-12 11:25:37 DEBUG StatusDiff:43 - Current Status path /observedGeneration has value 1 2019-07-12 11:25:37 DEBUG StatusDiff:44 - Desired Status path /observedGeneration has value 2 2019-07-12 11:25:37 DEBUG KafkaAssemblyOperator:394 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Completed status update 2019-07-12 11:25:37 DEBUG KafkaAssemblyOperator:206 - Status for my-cluster is up to date 2019-07-12 11:25:37 DEBUG AbstractAssemblyOperator:188 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Lock lock::kafka-cluster-test::Kafka::my-cluster released 2019-07-12 11:25:37 INFO AbstractAssemblyOperator:317 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Assembly reconciled
There are two problems:
- An exception from Fabric8 is printed to the log (it is not thrown to use, is is caught inside Fabric8 and logged)
- The scale() method should wait for the scaling but it doesn't. And we do not seem to do any further check for readiness - so we pretty much finish the reconciliation without the new broker being ready which is wrong.