Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-152

Add configuration option to always use streaming resultset during snapshot

    Details

      Description

      We are testing Debezium on some fairly large tables (a couple of hundred million rows), and our production databases are even bigger. We are noticing that Debezium seems to hang for quite some time before it starts the actual snapshot of each table.

      After a couple of threaddumps the hang seems to be caused by the SELECT COUNT(*) FROM <table> in io.debezium.connector.mysql.SnapshotReader.execute. This kind of query can be very slow for large InnoDB tables.

      It would be great to have a configuration option to always use the streaming resultset (and skip the select count query), or optimize this to get an approximate table size faster.

      For example, MySQL has a `show table status like <tableName>` that returns an approximate row count, perhaps that would be good enough for this use case.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                rhauch Randall Hauch
                Reporter:
                depe Dennis Persson
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: