Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-3970

The internal KafkaConnect topics are recreated with invalid configuration

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • 2.7.0.GA
    • 2.1.0.GA
    • topic-operator
    • False
    • None
    • False

      The customer is having the following KafkaConnect error following an automatic upgrade on OpenShift 4:

      2022-04-07 20:47:57,406 ERROR [Worker clientId=connect-1, groupId=ircc-connect-cluster] Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1-1]
      org.apache.kafka.common.config.ConfigException: Topic 'connect-cluster-offsets' supplied via the 'offset.storage.topic' property is required to have 'cleanup.policy=compact' to guarantee consistency and durability of source connector offsets, but found the topic currently has 'cleanup.policy=delete'. Continuing would likely result in eventually losing source connector offsets and problems restarting this Connect cluster in the future. Change the 'offset.storage.topic' property in the Connect worker configurations to use a topic with 'cleanup.policy=compact'.
      

      It is all good when you deploy KafkaConnect after the Kafka cluster is up and running.

      $ kubectl get kt | grep connect-cluster                                                                                       
      connect-cluster-configs                                                                            my-cluster   1            3                    True
      connect-cluster-offsets                                                                            my-cluster   25           3                    True
      connect-cluster-status                                                                             my-cluster   5            3                    True
      
      $ kubectl get kt connect-cluster-offsets -o yaml | yq eval ".spec" -
      config:
        cleanup.policy: compact
      partitions: 25
      replicas: 3
      topicName: connect-cluster-offsets
      

      Instead, this is what happens when Kafka and KafkaConnect are reconciled concurrently and you manually delete all topic resources.

      $ kubectl delete po --all && kubectl delete kt --all
      ...
      
      $ kubectl get kt | grep connect-cluster
      connect-cluster-configs                                          my-cluster   3            3                    
      connect-cluster-offsets                                          my-cluster   3            3                    
      connect-cluster-status                                           my-cluster   3            3                                     
      
      $ kubectl get kt connect-cluster-offsets -o yaml | yq eval ".spec" -
      config: {}
      partitions: 3
      replicas: 3
      topicName: connect-cluster-offsets
      

      At this point, we can check the TopicOperator log to see what happened to our connect-cluster-offsets topic for example.
      Initially, the topic is only present in Kafka from the previous deployment, so we need to create it in K8s.

      2022-04-29 10:52:21,07830 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #100(initial kafka connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Reconciling topic connect-cluster-offsets, k8sTopic:null, kafkaTopic:nonnull, privateTopic:nonnull
      

      Then we have lots of invalid state store errors, which I think are responsible for the lost topic configuration.

      2022-04-29 10:54:30,63425 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #735(periodic -connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Reconciling topic connect-cluster-offsets, k8sTopic:null, kafkaTopic:nonnull, privateTopic:null
      
      2022-04-29 10:54:38,37543 ERROR [vert.x-eventloop-thread-0] K8sTopicWatcher:69 - Reconciliation #943(kube +connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Failure processing KafkaTopic watch event ADDED on resource connect-cluster-offsets with labels {strimzi.io/cluster=my-cluster}: The state store, topic-store, may have migrated to another instance.
      

      Finally, the topic is created, but with the wrong configuration.

      2022-04-29 10:54:38,37525 INFO  [kubernetes-ops-pool-11] CrdOperator:113 - Reconciliation #926(kube +connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Status of KafkaTopic connect-cluster-offsets in namespace test has been updated
      

      After that, the TopicOperator does not work anymore, as it is stuck with an invalid state store (restarting the pod does not seem to help).

            Unassigned Unassigned
            rhn-support-fvaleri Federico Valeri
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: