Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1766

Postgres Connector losing data on restart due to commit() being called before events produced to Kafka

    XMLWordPrintable

Details

    Description

      We've run into an issue where we get lost events when replicating from Postgres to Kafka while Kafka Connect restarts. This happens because a `commit()` call gets made before the events from `poll()` are actually committed to Kafka. When Kafka Connect gets terminated before some of the events actually are produced, on restart it picks up from the committed offset. This leads to data loss.

      The underlying issue of these callbacks being called out of order is an open issue against Kafka Connect: https://issues.apache.org/jira/browse/KAFKA-5716

      Here's a repo with steps to reproduce the issue: https://github.com/mmarvick-convoy/debezium-restart-issue

      Note that in the repro steps, we use a custom build of Debezium with some additional logging during the commit and poll callbacks. You can edit the Kafka Connect dockerfile to use the latest stable version of Debezium 1.0 without the custom build, and you'll still be able to reproduce. We've been running on Debezium 0.9.5 and can still reproduce, so it's not a recent regression. Because of the underlying issue in Kafka Connect, this has probably been an issue from the beginning.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmarvick-convoy Michael Marvick (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: