Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-1213

All Connectors Disconnect After Closing One Connection

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9.3.Final
    • Fix Version/s: None
    • Component/s: postgresql-connector
    • Labels:
      None
    • Steps to Reproduce:
      1. Create a kafka Setup with 3 brokers, 3 connect servers, and 1 zookeeper
      2. Connect ~10 apps
      3. Disconnect 1 of the apps

      Description

      We are experiencing an issue where when we make a rest call to delete one on of the connections all the other connections fail with the same bug saying `failed when reading from copy`.

      We are running the 0.9 version of the debezium docker containers for kafka-connect, kafka, and zookeeper on an EC2 Server. Our postgres is an RDS server running version 10.6. We have 3 kafka instances, 3 connect instances, and 1 zookeeper instance running. We are connecting ~10 apps to the servers. Having all 10 up and running works fine, but we consistently see all the connectors go down after we remove one. Below are the logs we get when we try to delete one connector.

      connect-1_1  | 2019-04-01 23:03:37,567 INFO   ||  Successfully processed removal of connector 'dev_app_review_app_3784'   [org.apache.kafka.connect.storage.KafkaConfigBackingStore]
      connect-1_1  | 2019-04-01 23:03:37,567 INFO   ||  Connector dev_app_review_app_3784 config removed   [org.apache.kafka.connect.runtime.distributed.DistributedHerder]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  52.206.82.208 - - [01/Apr/2019:23:03:37 +0000] "DELETE /connectors/dev_app_review_app_3784/ HTTP/1.1" 204 0  505   [org.apache.kafka.connect.runtime.rest.RestServer]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  Rebalance started   [org.apache.kafka.connect.runtime.distributed.DistributedHerder]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  Stopping connector dev_app   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,070 INFO   ||  Stopping connector dev_app_review_app_3794   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,070 INFO   ||  Stopped connector dev_app   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,071 INFO   ||  Stopping connector dev_app_review_app_3784   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,071 INFO   ||  Stopped connector dev_app_review_app_3784   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,071 INFO   ||  Stopping connector dev_app_review_app_3786   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,071 INFO   ||  Stopped connector dev_app_review_app_3786   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,072 INFO   ||  Stopping connector dev_app_review_app_3787   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,072 INFO   ||  Stopped connector dev_app_review_app_3787   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,070 INFO   ||  Stopping connector dev_app_review_app_3793   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,070 INFO   ||  Stopping connector dev_app_review_app_3783   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  Stopping connector dev_app_review_app_3771   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  Stopping connector dev_app_review_app_3781   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  Stopping connector dev_app_review_app_3791   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,069 INFO   ||  Stopping connector dev_app_review_app_3790   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,081 INFO   ||  Stopped connector dev_app_review_app_3783   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,083 INFO   ||  Stopping connector dev_app_review_app_3788   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,083 INFO   ||  Stopped connector dev_app_review_app_3790   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,083 INFO   ||  Stopping connector dev_app_review_app_3739   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,083 INFO   ||  Stopped connector dev_app_review_app_3771   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopping task dev_app-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopped connector dev_app_review_app_3788   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopping task dev_app_review_app_3790-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopped connector dev_app_review_app_3739   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopping task dev_app_review_app_3791-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,080 INFO   ||  Stopping connector dev_app_review_app_3744   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,071 INFO   ||  Stopped connector dev_app_review_app_3794   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,103 INFO   ||  Stopping task dev_app_review_app_3781-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopped connector dev_app_review_app_3781   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,103 INFO   ||  Stopping task dev_app_review_app_3771-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,084 INFO   ||  Stopped connector dev_app_review_app_3791   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,103 INFO   ||  Stopping task dev_app_review_app_3793-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,083 INFO   ||  Stopped connector dev_app_review_app_3793   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,104 INFO   ||  Stopping task dev_app_review_app_3783-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,106 INFO   ||  Stopped connector dev_app_review_app_3744   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,107 INFO   ||  Stopping task dev_app_review_app_3794-0   [org.apache.kafka.connect.runtime.Worker]
      connect-1_1  | 2019-04-01 23:03:38,114 WARN   Postgres|dev_app_review_app_3781|records-stream-producer  Closing replication stream due to db connection IO exception...   [io.debezium.connector.postgresql.RecordsStreamProducer]
      connect-1_1  | 2019-04-01 23:03:38,116 WARN   Postgres|dev_app_review_app_3790|records-stream-producer  Closing replication stream due to db connection IO exception...   [io.debezium.connector.postgresql.RecordsStreamProducer]
      connect-1_1  | 2019-04-01 23:03:38,116 INFO   ||  WorkerSourceTask{id=dev_app_review_app_3781-0} Committing offsets   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      connect-1_1  | 2019-04-01 23:03:38,116 WARN   Postgres|dev_app_review_app_3783|records-stream-producer  Closing replication stream due to db connection IO exception...   [io.debezium.connector.postgresql.RecordsStreamProducer]
      connect-1_1  | 2019-04-01 23:03:38,117 INFO   ||  WorkerSourceTask{id=dev_app_review_app_3781-0} flushing 0 outstanding messages for offset commit   [org.apache.kafka.connect.runtime.WorkerSourceTask]
      connect-1_1  | 2019-04-01 23:03:38,682 ERROR  ||  WorkerSourceTask{id=dev_app_review_app_3787-0} Task threw an uncaught and unrecoverable exception   [org.apache.kafka.connect.runtime.WorkerTask]
      connect-1_1  | org.apache.kafka.connect.errors.ConnectException: An exception ocurred in the change event producer. This connector will be stopped.
      connect-1_1  | 	at io.debezium.connector.base.ChangeEventQueue.throwProducerFailureIfPresent(ChangeEventQueue.java:170)
      connect-1_1  | 	at io.debezium.connector.base.ChangeEventQueue.poll(ChangeEventQueue.java:151)
      connect-1_1  | 	at io.debezium.connector.postgresql.PostgresConnectorTask.poll(PostgresConnectorTask.java:156)
      connect-1_1  | 	at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:244)
      connect-1_1  | 	at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:220)
      connect-1_1  | 	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
      connect-1_1  | 	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
      connect-1_1  | 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      connect-1_1  | 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      connect-1_1  | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      connect-1_1  | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      connect-1_1  | 	at java.lang.Thread.run(Thread.java:748)
      connect-1_1  | Caused by: org.postgresql.util.PSQLException: Database connection failed when reading from copy
      connect-1_1  | 	at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1037)
      connect-1_1  | 	at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:41)
      connect-1_1  | 	at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:155)
      connect-1_1  | 	at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:124)
      connect-1_1  | 	at org.postgresql.core.v3.replication.V3PGReplicationStream.read(V3PGReplicationStream.java:70)
      connect-1_1  | 	at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.read(PostgresReplicationConnection.java:245)
      connect-1_1  | 	at io.debezium.connector.postgresql.RecordsStreamProducer.streamChanges(RecordsStreamProducer.java:131)
      connect-1_1  | 	at io.debezium.connector.postgresql.RecordsStreamProducer.lambda$start$0(RecordsStreamProducer.java:117)
      connect-1_1  | 	... 5 more
      connect-1_1  | Caused by: java.net.SocketException: Socket closed
      connect-1_1  | 	at java.net.SocketInputStream.read(SocketInputStream.java:204)
      connect-1_1  | 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
      connect-1_1  | 	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
      connect-1_1  | 	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
      connect-1_1  | 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
      connect-1_1  | 	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
      connect-1_1  | 	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
      connect-1_1  | 	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
      connect-1_1  | 	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
      connect-1_1  | 	at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
      connect-1_1  | 	at org.postgresql.core.PGStream.receiveChar(PGStream.java:306)
      connect-1_1  | 	at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1079)
      connect-1_1  | 	at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1035)
      connect-1_1  | 	... 12 more
      connect-1_1  | 2019-04-01 23:03:38,684 ERROR  ||  WorkerSourceTask{id=dev_app_review_app_3787-0} Task is being killed and will not recover until manually restarted   [org.apache.kafka.connect.runtime.WorkerTask]
      

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                donbfern Don Briggs
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: