Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
0.9.0.Final
-
None
Description
According to the documentation:
start_lsn - binary(10) Log sequence number (LSN) representing the low endpoint for querying the change table.
Let's imagine the schema of table T evolves - a new column is added. In order to handle it, a new capture instance is created. Initially, the old capture instance has higher start_lsn than the old one.
But what happens if the old capture has not been deleted for a period longer than logs retention period? Then both capture instances have exactly the same start_lsn, because they contain exactly the same rows (give or take some records when only a newly added column is modified - then the row exists only in the new capture instance).
1> SELECT * FROM cdc.change_tables where capture_instance like 'Stores_XXXXXXXXXXXXXXXXXXXX%'; 2> GO object_id version source_object_id capture_instance start_lsn end_lsn supports_net_changes has_drop_pending role_name index_name filegroup_name create_date partition_switch ----------- ----------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------- ---------------------- -------------------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ----------------------- ---------------- 1478296326 0 1182627256 Stores_XXXXXXXXXXXXXXXXXXXX 0x0004A4820000ABCD0051 NULL 0 NULL cdc_owner PK__Suppleme__69A0B800903E2D08 NULL 2018-11-30 13:13:12.530 1 1835869607 0 1182627256 Stores_XXXXXXXXXXXXXXXXXXXX_v190305140505 0x0004A4820000ABCD0051 NULL 0 NULL cdc_owner PK__Suppleme__69A0B800903E2D08 NULL 2019-03-05 14:05:06.630 1
In that case Debezium may consider the old capture instance as the newer one (in fact the order is undefined). In consequence, it ignores the newly added column. So basically data is partially lost.
if (captures.size() > 1) { ChangeTable futureTable; if (captures.get(0).getStartLsn().compareTo(captures.get(1).getStartLsn()) < 0) { futureTable = captures.get(1); } else { currentTable = captures.get(1); futureTable = captures.get(0); } currentTable.setStopLsn(futureTable.getStartLsn()); tables.add(futureTable); LOGGER.info("Multiple capture instances present for the same table: {} and {}", currentTable, futureTable); }
I believe the capture instances should be ordered based on `create_date` column.