Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1169

Support Outbox SMT as part of Debezium core

    XMLWordPrintable

    Details

      Description

      The Outbox Event Router SMT

      Given the following event list

      event_id event_type event_key aggregate_type aggregate_id payload
      2d71b7e9 UserCreated 9a82a356 User 9a82a356 {"id": "9a82a356", "username": "renatomefi"}
      55fac11e UserEmailAdded 9a82a356 UserEmail db216514 {"user_id": "9a82a356", "email": "gh@mefi.in"}
      c54f644f UserUpdated 9a82a356 User 9a82a356 {"id": "9a82a356", "username": "renatomefi", "name": "Renato"}

      Scenario: Keep all events in the same topic but manage the key

      One scenario is that all events can be published to the same topic, but the key can be chosen from one of the fields.

      This situation happens when you publish different aggregates but want to maintain order

      I.e.:

      // topic: user.events key: "9a82a356"
      {
          "id": "2d71b7e9",
          "type": "UserCreated",
          "User": {"id": "9a82a356", "username":"renatomefi"}
      }
      
      // topic: user.events key: "9a82a356"
      {
          "id": "55fac11e"
          "type": "UserEmailAdded",
          "UserEmail": {"user_id": "9a82a356", "email": "gh@mefi.in"}
      }
      
      // topic: user.events key: "9a82a356"
      {
          "id": "c54f644f",
          "type": "UserUpdated",
          "User": {"id": "9a82a356", "username": "renatomefi", "name": "Renato"}
      }
      

      Suggested configuration:

      {
          // ...
          "transforms": "OutboxEventRouter",
          "transforms.OutboxEventRouter.type": "io.debezium.outbox.transforms.EventRouter",
          "transforms.OutboxEventRouter.event.key": "event_key",
          "transforms.OutboxEventRouter.route.field": null,
          "transforms.OutboxEventRouter.route.prefix": "user.",
          "transforms.OutboxEventRouter.route.suffix": ".events"
      }
      

      Scenario: Route events per type

      Not a really good example with the dataset I'm using, but can be useful when the publisher always publishes snapshops or whole facts instead of granular events.

      // topic: user.UserCreated.events key: "9a82a356"
      {
          "id": "2d71b7e9",
          "type": "UserCreated",
          "User": {"id": "9a82a356", "username": "renatomefi"}
      }
      
      // topic: user.UserEmailAdded.events key: "9a82a356"
      {
          "id": "55fac11e"
          "type": "UserEmailAdded",
          "UserEmail": {"user_id": "9a82a356", "email": "gh@mefi.in"}
      }
      
      // topic: user.UserUpdated.events key: "9a82a356"
      {
          "id": "c54f644f",
          "type": "UserUpdated",
          "User": {"id": "9a82a356", "username": "renatomefi", "name": "Renato"}
      }
      

      Suggested configuration:

      {
          // ...
          "transforms": "OutboxEventRouter",
          "transforms.OutboxEventRouter.type": "io.debezium.outbox.transforms.EventRouter",
          "transforms.OutboxEventRouter.event.key": "aggregate_id",
          "transforms.OutboxEventRouter.route.field": "event_type",
          "transforms.OutboxEventRouter.route.prefix": "user.",
          "transforms.OutboxEventRouter.route.suffix": ".events"
      }
      

      Scenario: Route events per aggregate

      // topic: accounts.User.events key: "9a82a356"
      {
          "id": "2d71b7e9",
          "type": "UserCreated",
          "User": {"id": "9a82a356", "username": "renatomefi"}
      }
      
      // topic: accounts.User.events key: "9a82a356"
      {
          "id": "c54f644f",
          "type": "UserUpdated",
          "User": {"id": "9a82a356", "username": "renatomefi", "name": "Renato"}
      }
      
      // topic: accounts.UserEmail.events key: "db216514"
      {
          "id": "55fac11e"
          "type": "UserEmailAdded",
          "UserEmail": {"user_id": "9a82a356", "email": "gh@mefi.in"}
      }
      

      Suggested configuration:

      {
          // ...
          "transforms": "OutboxEventRouter",
          "transforms.OutboxEventRouter.type": "io.debezium.outbox.transforms.EventRouter",
          "transforms.OutboxEventRouter.event.key": "aggregate_id",
          "transforms.OutboxEventRouter.route.field": "aggregate_type",
          "transforms.OutboxEventRouter.route.prefix": "accounts.",
          "transforms.OutboxEventRouter.route.suffix": ".events"
      }
      

      Copy events to another topic

      A possibility that I envision is that you might want a topic with all the events from that certain service but also copy an event by either event_type or aggregate_type to another topic.

      For instance if your consumer has only interest in a certain aggregate but has to skip 90% of the messages, it might be worth to create a topic for that reason.

      As in this article

      >
      4. Look at the number of topics that a consumer needs to subscribe to. If several consumers all read a particular group of topics, this suggests that maybe those topics should be combined.If you combine the fine-grained topics into coarser-grained ones, some consumers may receive unwanted events that they need to ignore. That is not a big deal: consuming messages from Kafka is very cheap, so even if a consumer ends up ignoring half of the events, the cost of this overconsumption is probably not significant. Only if the consumer needs to ignore the vast majority of messages (e.g. 99.9% are unwanted) would I recommend splitting the low-volume event stream from the high-volume stream.

      Extract events to another topic

      Similar to the situation above, this would get all events to a single topic but with the possibility to route just an specific one to a new topic, in a sense it'd be a white/black list.

      Observations, questions, decisions

      A lot of assumptions here and I'm curious to hear your opinion:

      • The aggregate_type is transformed into a field which holds the payload, like {"User": {"id": "9a82a356", "username": "renatomefi"}} for the User aggregate. That could help when you generate the schema for a given Event
      • The event looks like:
          {
              "id": "2d71b7e9", // Unique event id, can be used for consumers to avoid duplication, EOS and more
              "type": "UserCreated", // Type of event defined by the producer
              "User": { // This key is taken from `aggregate_type` and represents a schema
                  "id": "9a82a356", // payload..
                  "username": "renatomefi" // payload..
              }
          }
          
      • Should add an event published date? Should it be set in the table by the application to represent the real moment or when debezium picks the event? The earlier seems more accurate and can also be used to detect delays in processing
      • The payload isn't a encoded json string, but an actual objected. Could it bring issues? Should be optional?
      • event_key and aggregate_id look redundant, maybe should keep just one of them since the user can configure which field to use as message key
      • Should the key be set via ValueToKey SMT?
      • Both prefix and suffix could be replaced by the official RegexRouter
              "transforms.renameTopic.type": "org.apache.kafka.connect.transforms.RegexRouter",
              "transforms.renameTopic.regex": "(.*)",
              "transforms.renameTopic.replacement": "user.$1.events"
          
      • Copying to another topic could be done via Mirus (and we document some configuration example) or another smt which produces the event twice but to different topics. Ideas?
      • All fields used in the transform should be configurable, to avoid forcing the users to adhere to a certain standard (like naming, with or without _, etc)
      • How to deal with messages other than c operation? I suggest the user can create a handler, which can either give an exception (locking the task) or ignore (while logging as warning)

      Final notes

      Does this approach makes any sense and would it be valuable to put within debezium?

      Those scenarios came after I analyzed what I'd need in the different microservices we'll put the outbox to run, some of them are event sourced, some are DDD, some aren't, some only produce snapshots and others can produce deltas. So I thought about making the process less opinionated and give the user some power to configure the fields with any name, route based on a field, choose the key and such.

      Thanks!

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                renatomefi Renato Mefi
                Reporter:
                renatomefi Renato Mefi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: