Mar 13, 2025

Creating an HTTP Source connector on Confluent Cloud from the CLI

In this blog article I’ll show you how you can use the confluent CLI to set up a Kafka cluster on Confluent Cloud, the necessary API keys, and then a managed connector. The connector I’m setting up is the HTTP Source (v2) connector. It’s part of a pipeline that I’m working on to pull in a feed of data from the UK Environment Agency for processing. The data is spread across three endpoints, and one of the nice features of the HTTP Source (v2) connector is that one connector can pull data from more than one endpoint.

Mar 13, 2025

Why is kcat showing the wrong topics?

Much as I love kcat (🤫 it’ll always be kafkacat to me…), this morning I nearly fell out with it 👇

😖 I thought I was going stir crazy, after listing topics on a broker and seeing topics from a different broker.

😵 WTF 😵

Mar 4, 2021

Quick profiling of data in Apache Kafka using kafkacat and visidata

ksqlDB is a fantastically powerful tool for processing and analysing streams of data in Apache Kafka. But sometimes, you just want a quick way to profile the data in a topic in Kafka. I wrote about this previously with a convoluted (but effective) set of bash commands pipelined together to perform a GROUP BY on data. Then someone introduced me to visidata, which makes it all a lot quicker!

Feb 26, 2021

Loading delimited data into Kafka - quick & dirty (but effective)

Whilst Apache Kafka is an event streaming platform designed for, well, streams of events, it’s perfectly valid to use it as a store of data which perhaps changes only occasionally (or even never). I’m thinking here of reference data (lookup data) that’s used to enrich regular streams of events.

You might well get your reference data from a database where it resides and do so effectively using CDC - but sometimes it comes down to those pesky CSV files that we all know and love/hate. Simple, awful, but effective. I wrote previously about loading CSV data into Kafka from files that are updated frequently, but here I want to look at CSV files that are not changing. Kafka Connect simplifies getting data in to (and out of) Kafka but even Kafka Connect becomes a bit of an overhead when you just have a single file that you want to load into a topic and then never deal with again. I spent this afternoon wrangling with a couple of CSV-ish files, and building on my previous article about neat tricks you can do in bash with data, I have some more to share with you here :)

Feb 2, 2021

Performing a GROUP BY on data in bash

One of the fun things about working with data over the years is learning how to use the tools of the day—but also learning to fall back on the tools that are always there for you - and one of those is bash and its wonderful library of shell tools.

There’s an even better way than I’ve described here, and it’s called visidata. I’ve written about it more over here.

I’ve been playing around with a new data source recently, and needed to understand more about its structure. Within a single stream there were multiple message types.

Oct 1, 2020

Ingesting XML data into Kafka - Option 1: The Dirty Hack

👉 Ingesting XML data into Kafka - Introduction

What would a blog post on rmoff.net be if it didn’t include the dirty hack option? 😁

The secret to dirty hacks is that they are often rather effective and when needs must, they can suffice. If you’re prototyping and need to JFDI, a dirty hack is just fine. If you’re looking for code to run in Production, then a dirty hack probably is not fine.

Sep 30, 2020

Setting key value when piping from jq to kafkacat

One of my favourite hacks for getting data into Kafka is using kafkacat and stdin, often from jq. You can see this in action with Wi-Fi data, IoT data, and data from a REST endpoint. This is fine for getting values into a Kafka message - but Kafka messages are key/value, and being able to specify a key is can often be important.

Here’s a way to do that, using a separator and some jq magic. Note that at the moment kafkacat only supports single byte separator characters, so you need to choose carefully. If you pick a separator that also appears in your data, it’s possibly going to have unintended consequences.

Sep 8, 2020

Counting the number of messages in a Kafka topic

There’s ways, and then there’s ways, to count the number of records/events/messages in a Kafka topic. Most of them are potentially inaccurate, or inefficient, or both. Here’s one that falls into the potentially inefficient category, using kafkacat to read all the messages and pipe to wc which with the -l will tell you how many lines there are, and since each message is a line, how many messages you have in the Kafka topic:

$ kafkacat -b broker:29092 -t mytestopic -C -e -q| wc -l
       3

Jul 3, 2020

Why JSON isn’t the same as JSON Schema in Kafka Connect converters and ksqlDB (Viewing Kafka messages bytes as hex)

I’ve been playing around with the new SerDes (serialisers/deserialisers) that shipped with Confluent Platform 5.5 - Protobuf, and JSON Schema (these were added to the existing support for Avro). The serialisers (and associated Kafka Connect converters) take a payload and serialise it into bytes for sending to Kafka, and I was interested in what those bytes look like. For that I used my favourite Kafka swiss-army knife: kafkacat.

Apr 20, 2020

How to install kafkacat on Fedora

kafkacat is one of my go-to tools when working with Kafka. It’s a producer and consumer, but also a swiss-army knife of debugging and troubleshooting capabilities. So when I built a new Fedora server recently, I needed to get it installed. Unfortunately there’s no pre-packed install available on yum, so here’s how to do it manually. Note kafkacat is now known as kcat (ref). When invoking the command you will need to use kcat in place of kafkacat.