Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 0.9.0.Alpha2
Affects Version/s: 0.8.2.Final, 0.9.0.Alpha1
Component/s: postgresql-connector
Labels:
None

Git Pull Request:
https://github.com/debezium/debezium/pull/642, https://github.com/debezium/debezium.github.io/pull/207
Steps to Reproduce:
Hide

Create a table (in a PostgreSQL database) with at least one extended storage column. A text column that stores html pages is a good example.

Set the table's REPLICA IDENTITY to 'FULL'.

Populate the table, ensuring the extended storage column data triggers toast processing. (Use 2+ kb of text, for example). Create at least a few thousand rows.

Update all (or a large number) of the table rows, where the update does not change the extended storage column's data.

Observe that the PostgreSQL Connector takes several minutes to process the changes.

Update all (or a large number) of the table rows, this time changing the extended storage column's data.

Observe that the PostgreSQL Connector takes several seconds to process the changes.
Show
Create a table (in a PostgreSQL database) with at least one extended storage column. A text column that stores html pages is a good example. Set the table's REPLICA IDENTITY to 'FULL'. Populate the table, ensuring the extended storage column data triggers toast processing. (Use 2+ kb of text, for example). Create at least a few thousand rows. Update all (or a large number) of the table rows, where the update does not change the extended storage column's data. Observe that the PostgreSQL Connector takes several minutes to process the changes. Update all (or a large number) of the table rows, this time changing the extended storage column's data. Observe that the PostgreSQL Connector takes several seconds to process the changes.

SFDC Cases Counter:
SFDC Cases Links:

Description

If a row contains columns with TOASTed data and an update to that row does not alter the TOASTed data, Debezium's PostgresSQL Connector unnecessarily refreshes its local cache of table schemas while processing the update. In the worst case, this results in a query to the database for each update event on that table. The overhead is tens to the low hundreds of milliseconds. Multiplied by several thousand or more change events, and the performance hit is obvious. Where we expect an update of, say, 3000 rows to be processed in several seconds, processing instead takes almost 10 minutes.

I have confirmed this issue when using `REPLICA IDENTITY FULL`. This setting guarantees that the `before` record will provide all column values, since the WAL record provides the entire row as the key. This setting is rarely used, as it is inefficient, which explains why this bug has gone undetected. However, there are valid use cases; the issue cannot be ignored. I suspect that the issue exists for other `REPLICA IDENTITY` settings, but I have not tested this. If so (and I am confident it does), this is a serious performance issue that will affect all PostgreSQL users.

Attachments

Issue Links

is related to

DBZ-842 "no known snapshots" error when DBs rows are large

Closed

relates to

DBZ-842 "no known snapshots" error when DBs rows are large

Closed

Activity

People

Assignee:: Ian Axelrod (Inactive)

Reporter:: Ian Axelrod (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018/09/14 4:28 PM

Updated:: 2023/01/24 3:42 PM

Resolved:: 2018/10/04 4:39 AM