Status: Closed (View Workflow)
Affects Version/s: 0.10.0.CR2
Fix Version/s: 0.10.0.Final
3x m5a.large Kafka instances on Kafka 2.3
1x m5a.large Kafka Connect with Debezium instance on Confluent Platform 5.3.1 + v.10 debezium
Steps to Reproduce:
1) Set up Kafka/Postgres/Debezium with the latest version and default configs
2) Insert a large number of records into a Postgres DB one at a time with no transactions.
3) Set up a Kafka consumer and see that only 1 event per second is added into the Kafka topic at a time.
Git Pull Request:
During load tests, I encountered a performance issue with Debezium sending events to Kafka from Postgres under the following conditions:
- Using pgoutput plugin
- Sending a large number of events 1 row at a time (no transactions)
Debezium logs show that it sends 1 event to Kafka per `poll.interval.ms` interval. So for 1000 events with the default config, it would take Debezium 16+ minutes to replicate everything to Kafka.
This bug does not occur when using the wal2json plugin. When sending the events all at once in a transaction, this bug does not affect the performance.
I did some tests that made me believe it is indeed a problem on Debezium's end:
1) I tried setting up replication from one postgres instance to another using pgoutput and manually creating the publication and subscription on each. The replica DB was able to receive all 1000 events sent the same way nearly instantaneously
2) I tried setting up replication from the debezium docker postgres:11 image and sent 1000 events. In the middle of the replication (still at 1 event per `poll.interval.ms`) I stopped the docker postgres instance. Debezium was able to continue sending the events one by one to Kafka eventually.
These tests suggest that a) the problem is indeed with the Debezium pgoutput code, and b) Debezium is having issues consuming from some internal message queue, as it consumers/sends only 1 event per `poll.interval.ms` from its internal queue instead of a larger batch as allowed by Kafka's batch.size config.
I have mentioned and discussed this bug with Chris on the Debezium Users Gitters channel and sent him my Kafka Connect logs. He has verified:
Hi @xinluo-gogovan I analyzed your log. In total, the log contained 3001 events that started at 06:41:49.374 and ended at 06:43:29.275; so roughly 100 seconds.
There were 1000 begins, 1000 commits, 1000 inserts, and 1 relational message - which is what I had hoped it would contain for the 1-by-1 performance benchmark you're doing.
So basically the log supports having 10 transactions emitted to debezium per second.