Details
-
Task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
Description
The idea is to export an id uniquely identifying a given change event, based on its position in the source DB's log (e.g. binlog offset in case of MySQL). This would be a single field, e.g. based on hashing all the (connector-specific) offset attributes. Having one single field which can be handled by consumers without having to know about the connector-specific details will allow for duplicate detection there. In particular, a sink connector will be able to ignore duplicates by using INSERT queries that result in a no-op if there already is a record on the sink with the same unique event id.
Note that – unlike basing this operation of sink connectors on Kafka topic offsets – basing this information on the actual offset in the source DB this will ensure that duplicates can also be detected after a Debezium connector restart, as the same event exported a second time will have the same event id.
This attribute could go into the "source" structure or alternatively be conveyed as a header property.