Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1267

Incorrect ordering of capture instances of the same table

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • 0.9.0.Final
    • sqlserver-connector
    • None

    Description

      According to the documentation:

      start_lsn - binary(10) Log sequence number (LSN) representing the low endpoint for querying the change table.

      Let's imagine the schema of table T evolves - a new column is added. In order to handle it, a new capture instance is created. Initially, the old capture instance has higher start_lsn than the old one.

      But what happens if the old capture has not been deleted for a period longer than logs retention period? Then both capture instances have exactly the same start_lsn, because they contain exactly the same rows (give or take some records when only a newly added column is modified - then the row exists only in the new capture instance).

      1> SELECT * FROM cdc.change_tables where capture_instance like 'Stores_XXXXXXXXXXXXXXXXXXXX%';
      2> GO
      object_id   version     source_object_id capture_instance                                                                                                                 start_lsn              end_lsn                supports_net_changes has_drop_pending role_name                                                                                                                        index_name                                                                                                                       filegroup_name                                                                                                                   create_date             partition_switch
      ----------- ----------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------- ---------------------- -------------------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ----------------------- ----------------
       1478296326           0       1182627256 Stores_XXXXXXXXXXXXXXXXXXXX                                                                                                      0x0004A4820000ABCD0051 NULL                                      0             NULL cdc_owner                                                                                                                        PK__Suppleme__69A0B800903E2D08                                                                                                   NULL                                                                                                                             2018-11-30 13:13:12.530                1
       1835869607           0       1182627256 Stores_XXXXXXXXXXXXXXXXXXXX_v190305140505                                                                                        0x0004A4820000ABCD0051 NULL                                      0             NULL cdc_owner                                                                                                                        PK__Suppleme__69A0B800903E2D08                                                                                                   NULL                                                                                                                             2019-03-05 14:05:06.630                1
      

      In that case Debezium may consider the old capture instance as the newer one (in fact the order is undefined). In consequence, it ignores the newly added column. So basically data is partially lost.

                 if (captures.size() > 1) {
                      ChangeTable futureTable;
                      if (captures.get(0).getStartLsn().compareTo(captures.get(1).getStartLsn()) < 0) {
                          futureTable = captures.get(1);
                      }
                      else {
                          currentTable = captures.get(1);
                          futureTable = captures.get(0);
                      }
                      currentTable.setStopLsn(futureTable.getStartLsn());
                      tables.add(futureTable);
                      LOGGER.info("Multiple capture instances present for the same table: {} and {}", currentTable, futureTable);
                  }
      

      I believe the capture instances should be ordered based on `create_date` column.

      Attachments

        Activity

          People

            Unassigned Unassigned
            grzegorz.kolakowski Grzegorz KoĊ‚akowski (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: