Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-607

Implement a CDC connector for Apache Cassandra

      Are there any plans to enable Cassandra support?

            [DBZ-607] Implement a CDC connector for Apache Cassandra

            Released

            Jiri Pechanec added a comment - Released

            Thanks a lot for sending that first PR, jgao54; super-happy to see it!

            Hey abrarsheikh, welcome and nice seeing you here. Thanks for sharing those slides, they are really insightful. Looking forward to collaborating with you on this, too!

            Gunnar Morling added a comment - Thanks a lot for sending that first PR, jgao54 ; super-happy to see it! Hey abrarsheikh , welcome and nice seeing you here. Thanks for sharing those slides, they are really insightful. Looking forward to collaborating with you on this, too!

            sharing my slides from datastax conference for a possible design pattern for solving this problem https://www.slideshare.net/AbrarSheikh1/streaming-data-from-cassandra-into-kafka. jgao54 thanks for picking this up, looking forward to collaborate

            Abrar Ahmed Sheikh (Inactive) added a comment - sharing my slides from datastax conference for a possible design pattern for solving this problem https://www.slideshare.net/AbrarSheikh1/streaming-data-from-cassandra-into-kafka . jgao54 thanks for picking this up, looking forward to collaborate

            Hey jgao54, took the liberty to assign this one to you

            Gunnar Morling added a comment - Hey jgao54 , took the liberty to assign this one to you

            Hey jgao54, criccomini, just stumbling upon this one. I'd suggest to use it as the hub for any related discussion.

            Gunnar Morling added a comment - Hey jgao54 , criccomini , just stumbling upon this one. I'd suggest to use it as the hub for any related discussion.

            From the documentation it doesn't look like there is less config / preparation steps requied in order to enable cdc. For instance, cdc should be enabled per table using " WITH cdc=true" and also in the cassandra.yaml file.

            The side car process should be install on the actual cassandra node to monitor cdc directory for a new files (interesting if there is a way to flush more frequently memtable). Also there are some caveates on ingesting cdc because there might be replica data - so need to figure out how to de-duplicate and where.

            Tony Tony (Inactive) added a comment - From the documentation it doesn't look like there is less config / preparation steps requied in order to enable cdc. For instance, cdc should be enabled per table using " WITH cdc=true" and also in the cassandra.yaml file. The side car process should be install on the actual cassandra node to monitor cdc directory for a new files (interesting if there is a way to flush more frequently memtable). Also there are some caveates on ingesting cdc because there might be replica data - so need to figure out how to de-duplicate and where.

            Yeah, I had a quick look at this a while ago to better understand the options and I also concluded that 1) would be the way to go. Triggers are quite invasive and need special installation/configuration for each captured table which isn't desirable.

            IIRC, the CDC supported required to install some component on the actual Cassandra host, which would be a bit different from the existing connectors where we essentially use the DB's client API to connect to the server. So we'd have to figure a way to get the changes from that other service into our connector (which runs within the Kafka Connect process).

            Gunnar Morling added a comment - Yeah, I had a quick look at this a while ago to better understand the options and I also concluded that 1) would be the way to go. Triggers are quite invasive and need special installation/configuration for each captured table which isn't desirable. IIRC, the CDC supported required to install some component on the actual Cassandra host, which would be a bit different from the existing connectors where we essentially use the DB's client API to connect to the server. So we'd have to figure a way to get the changes from that other service into our connector (which runs within the Kafka Connect process).

            gunnar.morling

            Let's discuss here how we can attack the problem - I need to have a chat with the company's legal to see how to contribute at the end of the day.

            I see at least 2 options here:

            1. Cassandra CDC. Seems like the best option to go for. Although we can't claim near real-time replication here due to memtable flush mechanism.
            2. Cassandra Trigger. Due to its nature to run befor operation even complete I wouldn't say it's reliable mechanism for replication. Although it's a way more faster (near real-time) by compare to #1

            Tony Tony (Inactive) added a comment - gunnar.morling Let's discuss here how we can attack the problem - I need to have a chat with the company's legal to see how to contribute at the end of the day. I see at least 2 options here: 1. Cassandra CDC. Seems like the best option to go for. Although we can't claim near real-time replication here due to memtable flush mechanism. 2. Cassandra Trigger. Due to its nature to run befor operation even complete I wouldn't say it's reliable mechanism for replication. Although it's a way more faster (near real-time) by compare to #1

            Hi yandooo, it's one idea our long-term roadmap, but so far we don't have seen many requests about it. I'd personally like to have Cassandra support, but really it's a question of priorities and capacities we have. If you'd like to see it done sooner than later, your best chance would be to contribute a connector yourself. If you're interested, let me know and we can have a discussion on how to approach it.

            Gunnar Morling added a comment - Hi yandooo , it's one idea our long-term roadmap, but so far we don't have seen many requests about it. I'd personally like to have Cassandra support, but really it's a question of priorities and capacities we have. If you'd like to see it done sooner than later, your best chance would be to contribute a connector yourself. If you're interested, let me know and we can have a discussion on how to approach it.

              jgao54 Joy Gao (Inactive)
              yandooo Tony Tony (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: