Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-152

Add configuration option to always use streaming resultset during snapshot

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Done
    • Major
    • 0.3.6, 0.4
    • 0.3.4
    • mysql-connector
    • None

    Description

      We are testing Debezium on some fairly large tables (a couple of hundred million rows), and our production databases are even bigger. We are noticing that Debezium seems to hang for quite some time before it starts the actual snapshot of each table.

      After a couple of threaddumps the hang seems to be caused by the SELECT COUNT(*) FROM <table> in io.debezium.connector.mysql.SnapshotReader.execute. This kind of query can be very slow for large InnoDB tables.

      It would be great to have a configuration option to always use the streaming resultset (and skip the select count query), or optimize this to get an approximate table size faster.

      For example, MySQL has a `show table status like <tableName>` that returns an approximate row count, perhaps that would be good enough for this use case.

      Attachments

        Activity

          People

            rhauch Randall Hauch (Inactive)
            depe_jira Dennis Persson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: