Using the Debezium MS SQL connector with ksqlDB embedded Kafka Connect
Prompted by a question on StackOverflow I thought I’d take a quick look at setting up ksqlDB to ingest CDC events from Microsoft SQL Server using Debezium. Some of this is based on my previous article, Streaming data from SQL Server to Kafka to Snowflake ❄️ with Kafka Connect.
Setting up the Docker Compose
I like standalone, repeatable, demo code. For that reason I love using Docker Compose and I embed everything in there - connector installation, the kitchen sink - the works.
Primitive Keys in ksqlDB
ksqlDB 0.7 will add support for message keys as primitive data types beyond just STRING (which is all we’ve had to date). That means that Kafka messages are going to be much easier to work with, and require less wrangling to get into the form in which you need them. Take an example of a database table that you’ve ingested into a Kafka topic, and want to join to a stream of events. Previously you’d have had to take the Kafka topic into which the table had been ingested and run a ksqlDB processor to re-key the messages such that ksqlDB could join on them. Friends, I am here to tell you that this is no longer needed!
Streaming data from SQL Server to Kafka to Snowflake ❄️ with Kafka Connect
Snowflake is the data warehouse built for the cloud, so let’s get all ☁️ cloudy and stream some data from Kafka running in Confluent Cloud to Snowflake!
What I’m showing also works just as well for an on-premises Kafka cluster. I’m using SQL Server as an example data source, with Debezium to capture and stream and changes from it into Kafka.
I’m assuming that you’ve signed up for Confluent Cloud and Snowflake and are the proud owner of credentials for both. I’m going to use a demo rig based on Docker to provision SQL Server and a Kafka Connect worker, but you can use your own setup if you want.
Debezium & MySQL v8 : Public Key Retrieval Is Not Allowed
I started hitting problems when trying Debezium against MySQL v8. When creating the connector:
Using Kafka Connect and Debezium with Confluent Cloud
This is based on using Confluent Cloud to provide your managed Kafka and Schema Registry. All that you run yourself is the Kafka Connect worker.
Optionally, you can use this Docker Compose to run the worker and a sample MySQL database.
Streaming data from Oracle into Kafka
This is a short summary discussing what the options are for integrating Oracle RDBMS into Kafka, as of December 2018 (refreshed June 2020). For a more detailed background to why and how at a broader level for all databases (not just Oracle) see this blog and this talk.
What techniques & tools are there?
Franck Pachot has written up an excellent analysis of the options available here.
Streaming Data from MongoDB into Kafka with Kafka Connect and Debezium
Disclaimer: I am not a MongoDB person. These steps may or may not be appropriate and proper. But they worked for me :) Feel free to post in comments if I’m doing something wrong
MongoDB config - enabling replica sets
For Debezium to be able to stream changes from MongoDB, Mongo needs to have replication configured:
Docs: Replication / Convert a Standalone to a Replica Set
Stop Mongo:
rmoff@proxmox01 ~> sudo service mongod stop
Add replica set config to /etc/mongod.conf:
Streaming Data from MySQL into Kafka with Kafka Connect and Debezium
Debezium is a CDC tool that can stream changes from MySQL, MongoDB, and PostgreSQL into Kafka, using Kafka Connect. In this article we’ll see how to set it up and examine the format of the data. A subsequent article will show using this realtime stream of data from a RDBMS and join it to data originating from other sources, using KSQL.
The software versions used here are:
- Confluent Platform 4.0
- Debezium 0.7.2
- MySQL 5.7.19 with Sakila sample database installed
Install Debezium
To use it, you need the relevant JAR for the source system (e.g. MySQL), and make that JAR available to Kafka Connect. Here we’ll set it up for MySQL.