$ kafkacat -b broker:29092 -t mytestopic -C -e -q| wc -l
3
Quick profiling of data in Apache Kafka using kafkacat and visidata
ksqlDB is a fantastically powerful tool for processing and analysing streams of data in Apache Kafka. But sometimes, you just want a quick way to profile the data in a topic in Kafka. I wrote about this previously with a convoluted (but effective) set of bash commands pipelined together to perform a GROUP BY
on data. Then someone introduced me to visidata
, which makes it all a lot quicker!
Loading delimited data into Kafka - quick & dirty (but effective)
Whilst Apache Kafka is an event streaming platform designed for, well, streams of events, it’s perfectly valid to use it as a store of data which perhaps changes only occasionally (or even never). I’m thinking here of reference data (lookup data) that’s used to enrich regular streams of events.
You might well get your reference data from a database where it resides and do so effectively using CDC - but sometimes it comes down to those pesky CSV files that we all know and love/hate. Simple, awful, but effective. I wrote previously about loading CSV data into Kafka from files that are updated frequently, but here I want to look at CSV files that are not changing. Kafka Connect simplifies getting data in to (and out of) Kafka but even Kafka Connect becomes a bit of an overhead when you just have a single file that you want to load into a topic and then never deal with again. I spent this afternoon wrangling with a couple of CSV-ish files, and building on my previous article about neat tricks you can do in bash with data, I have some more to share with you here :)
Performing a GROUP BY on data in bash
One of the fun things about working with data over the years is learning how to use the tools of the day—but also learning to fall back on the tools that are always there for you - and one of those is bash and its wonderful library of shell tools.
There’s an even better way than I’ve described here, and it’s called visidata . I’ve written about it more over here.
|
I’ve been playing around with a new data source recently, and needed to understand more about its structure. Within a single stream there were multiple message types.
Ingesting XML data into Kafka - Option 1: The Dirty Hack
What would a blog post on rmoff.net
be if it didn’t include the dirty hack option? 😁
The secret to dirty hacks is that they are often rather effective and when needs must, they can suffice. If you’re prototyping and need to JFDI, a dirty hack is just fine. If you’re looking for code to run in Production, then a dirty hack probably is not fine.
Setting key value when piping from jq to kafkacat
One of my favourite hacks for getting data into Kafka is using kafkacat and stdin
, often from jq
. You can see this in action with Wi-Fi data, IoT data, and data from a REST endpoint. This is fine for getting values into a Kafka message - but Kafka messages are key/value, and being able to specify a key is can often be important.
Here’s a way to do that, using a separator and some jq
magic. Note that at the moment kafkacat only supports single byte separator characters, so you need to choose carefully. If you pick a separator that also appears in your data, it’s possibly going to have unintended consequences.
Why JSON isn’t the same as JSON Schema in Kafka Connect converters and ksqlDB (Viewing Kafka messages bytes as hex)
I’ve been playing around with the new SerDes (serialisers/deserialisers) that shipped with Confluent Platform 5.5 - Protobuf, and JSON Schema (these were added to the existing support for Avro). The serialisers (and associated Kafka Connect converters) take a payload and serialise it into bytes for sending to Kafka, and I was interested in what those bytes look like. For that I used my favourite Kafka swiss-army knife: kafkacat.