Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-4414

KafkaRoller rewrite


    • Kafka Roller 2.0
    • False
    • None
    • False
    • To Do
    • 80% To Do, 20% In Progress, 0% Done


      The existing KafkaRoller has been suffering from the following shortcomings:

      • It doesn’t understand log recovery
        • Misinterprets it as a problem and restarts again.
      • It doesn’t worry about partition leadership
        • Should let broker resume preferred leadership after rolling
      • Its availability check KafkaAvailability is too resource-intensive
        • All topic descriptions in memory at once
      • It’s difficult to reason about
      • For KRaft we may need logic for process.role=controller and process.roles=broker,controller
      • Large clusters: duration of rolling restarts becomes a problem
      • Tension with Cruise Control e.g. over leadership

      Kafka Roller 2.0

      A new KafkaRoller design has been proposed and prototyped by tbentley-1 with the following aspects

      Enhanced Broker Status Reflection

      • An observation abstraction is introduced that allows collecting various information about the broker from different sources such as Kubernetes (Pod status etc), Kafka Admin API (ISR state etc) and metrics endpoints.
      • Exposing dedicated Kafka Brokers metrics endpoint that would be more capable of understanding and exposing the broker state (e.g. in_log_recovery )
      • The rich observations enables a sophisticated classification of the broker status which more accurately reflects its state.

      Repeatable, Predictable Rolling

      • The set of broker states form a state machine which reflects the available states and possible transitions.
      • The roller would be responsible of continuously collecting observations, classifying the broker state, and reconciling brokers in unhealthy states via the defined state machine transitions. 
      • A processor abstraction is introduced for processing state transitions.


      • Create a Strimzi proposal upstream
      • Work on the existing prototype to reach a functional PoC
        • The PoC introduced clear interfaces between observe, classify and process which allows to work on parallel in a test driven approach for validation.
        • Exposing dedicated broker metrics via a broker side component (Java agent, metric reporter, etc)
      • Developing an alpha version of the Kafka Roller 2.0 in Strimzi that sits behind a feature flag that would be driven to a GA feature.
      • Thorough test coverage
        • Leverage property testing for the correctness validation
        • Defining and implementing a full set of test cases which can start with the golden path tests to define the testing framework and structure
        • Continue defining different test cases

            rh-ee-gselenge Gantigmaa Selenge
            ahmedabdalla Ahmed Abdalla
            0 Vote for this issue
            1 Start watching this issue
