Aug 18, 2025

Kafka to Iceberg - Exploring the Options

You’ve got data in Apache Kafka.

You want to get that data into Apache Iceberg.

What’s the best way to do it?

Perhaps invariably, the answer is: IT DEPENDS. But fear not: here is a guide to help you navigate your way to choosing the best solution for you 🫵.

Jul 4, 2025

Writing to Apache Iceberg on S3 using Kafka Connect with Glue catalog

Without wanting to mix my temperature metaphors, Iceberg is the new hawtness, and getting data into it from other places is a common task. I wrote previously about using Flink SQL to do this, and today I’m going to look at doing the same using Kafka Connect.

Kafka Connect can send data to Iceberg from any Kafka topic. The source Kafka topic(s) can be populated by a Kafka Connect source connector (such as Debezium), or a regular application producing directly to it.

Mar 13, 2025

Creating an HTTP Source connector on Confluent Cloud from the CLI

In this blog article I’ll show you how you can use the confluent CLI to set up a Kafka cluster on Confluent Cloud, the necessary API keys, and then a managed connector. The connector I’m setting up is the HTTP Source (v2) connector. It’s part of a pipeline that I’m working on to pull in a feed of data from the UK Environment Agency for processing. The data is spread across three endpoints, and one of the nice features of the HTTP Source (v2) connector is that one connector can pull data from more than one endpoint.

Mar 26, 2021

Loading CSV data into Confluent Cloud using the FilePulse connector

The FilePulse connector from Florian Hussonnois is a really useful connector for Kafka Connect which enables you to ingest flat files including CSV, JSON, XML, etc into Kafka. You can read more it in its overview here. Other connectors for ingested CSV data include kafka-connect-spooldir (which I wrote about previously), and kafka-connect-fs.

Here I’ll show how to use it to stream CSV data into a topic in Confluent Cloud. You can apply the same config pattern to any other secured Kafka cluster.

Mar 19, 2021

Using ksqlDB to process data ingested from ActiveMQ with Kafka Connect

The ActiveMQ source connector creates a Struct holding the value of the message from ActiveMQ (as well as its key). This is as would be expected. However, you can encounter challenges in working with the data if the ActiveMQ data of interest within the payload is complex. Things like converters and schemas can get really funky, really quick.

Mar 12, 2021

Kafka Connect JDBC Sink deep-dive: Working with Primary Keys

The Kafka Connect JDBC Sink can be used to stream data from a Kafka topic to a database such as Oracle, Postgres, MySQL, DB2, etc.

It supports many permutations of configuration around how primary keys are handled. The documentation details these. This article aims to illustrate and expand on this.

Mar 11, 2021

Kafka Connect - SQLSyntaxErrorException: BLOB/TEXT column … used in key specification without a key length

I got the error SQLSyntaxErrorException: BLOB/TEXT column 'MESSAGE_KEY' used in key specification without a key length with Kafka Connect JDBC Sink connector (v10.0.2) and MySQL (8.0.23)

Jan 11, 2021

Running a self-managed Kafka Connect worker for Confluent Cloud

Confluent Cloud is not only a fully-managed Apache Kafka service, but also provides important additional pieces for building applications and pipelines including managed connectors, Schema Registry, and ksqlDB. Managed Connectors are run for you (hence, managed!) within Confluent Cloud - you just specify the technology to which you want to integrate in or out of Kafka and Confluent Cloud does the rest.

Jan 6, 2021

Creating topics with Kafka Connect

When Kafka Connect ingests data from a source system into Kafka it writes it to a topic. If you have set auto.create.topics.enable = true on your broker then the topic will be created when written to. If auto.create.topics.enable = false (as it is on Confluent Cloud and many self-managed environments, for good reasons) then you can tell Kafka Connect to create those topics first. This was added in Apache Kafka 2.6 (Confluent Platform 6.0) - prior to that you had to manually create the topics yourself otherwise the connector would fail.

Jan 4, 2021

Kafka Connect - Deep Dive into Single Message Transforms

KIP-66 was added in Apache Kafka 0.10.2 and brought new functionality called Single Message Transforms (SMT). Using SMT you can modify the data and its characteristics as it passes through Kafka Connect pipeline, without needing additional stream processors. For things like manipulating fields, changing topic names, conditionally dropping messages, and more, SMT are a perfect solution. If you get to things like aggregation, joining streams, and lookups then SMT may not be the best for you and you should head over to Kafka Streams or ksqlDB instead.