ksqlDB

A bash script to deploy ksqlDB queries automagically

There’s a bunch of improvements in the works for how ksqlDB handles code deployments and migrations. For now though, for deploying queries there’s the option of using headless mode (which is limited to one query file and disables subsequent interactive work on the server from a CLI), manually running commands (yuck), or using the REST endpoint to deploy queries automagically. Here’s an example of doing that.

Continue Reading

Connecting to managed ksqlDB in Confluent Cloud with REST and ksqlDB CLI

Using ksqlDB in Confluent Cloud makes things a whole bunch easier because now you just get to build apps and streaming pipelines, instead of having to run and manage a bunch of infrastructure yourself.

Once you’ve got ksqlDB provisioned on Confluent Cloud you can use the web-based editor to build and run queries. You can also connect to it using the REST API and the ksqlDB CLI tool. Here’s how.

Continue Reading

Using ksqlDB to process data ingested from ActiveMQ with Kafka Connect

The ActiveMQ source connector creates a Struct holding the value of the message from ActiveMQ (as well as its key). This is as would be expected. However, you can encounter challenges in working with the data if the ActiveMQ data of interest within the payload is complex. Things like converters and schemas can get really funky, really quick.

Continue Reading

Loading delimited data into Kafka - quick & dirty (but effective)

Whilst Apache Kafka is an event streaming platform designed for, well, streams of events, it’s perfectly valid to use it as a store of data which perhaps changes only occasionally (or even never). I’m thinking here of reference data (lookup data) that’s used to enrich regular streams of events.

You might well get your reference data from a database where it resides and do so effectively using CDC - but sometimes it comes down to those pesky CSV files that we all know and love/hate. Simple, awful, but effective. I wrote previously about loading CSV data into Kafka from files that are updated frequently, but here I want to look at CSV files that are not changing. Kafka Connect simplifies getting data in to (and out of) Kafka but even Kafka Connect becomes a bit of an overhead when you just have a single file that you want to load into a topic and then never deal with again. I spent this afternoon wrangling with a couple of CSV-ish files, and building on my previous article about neat tricks you can do in bash with data, I have some more to share with you here :)

Continue Reading

📼 ksqlDB HOWTO - A mini video series 📼

Some people learn through doing - and for that there’s a bunch of good ksqlDB tutorials here and here. Others may prefer to watch and listen first, before getting hands on. And for that, I humbly offer you this little series of videos all about ksqlDB. They’re all based on a set of demo scripts that you can run for yourself and try out.

🚨 Make sure you subscribe to my YouTube channel so that you don’t miss more videos like these!

Continue Reading

Kafka Connect, ksqlDB, and Kafka Tombstone messages

As you may already realise, Kafka is not just a fancy message bus, or a pipe for big data. It’s an event streaming platform! If this is news to you, I’ll wait here whilst you read this or watch this…

Continue Reading

Streaming Geopoint data from Kafka to Elasticsearch

Streaming data from Kafka to Elasticsearch is easy with Kafka Connect - you can see how in this tutorial and video.

One of the things that sometimes causes issues though is how to get location data correctly indexed into Elasticsearch as geo_point fields to enable all that lovely location analysis. Unlike data types like dates and numerics, Elasticsearch’s Dynamic Field Mapping won’t automagically pick up geo_point data, and so you have to do two things:

Continue Reading

ksqlDB - How to model a variable number of fields in a nested value (`STRUCT`)

There was a good question on StackOverflow recently in which someone was struggling to find the appropriate ksqlDB DDL to model a source topic in which there was a variable number of fields in a STRUCT.

Continue Reading

📌 🎁 A collection of Kafka-related talks 💝

Here’s a collection of Kafka-related talks, just for you.

Each one has 🍿🎥 a recording, 📔 slides, and 👾 code to go and try out.

Continue Reading

Using the Debezium MS SQL connector with ksqlDB embedded Kafka Connect

Prompted by a question on StackOverflow I thought I’d take a quick look at setting up ksqlDB to ingest CDC events from Microsoft SQL Server using Debezium. Some of this is based on my previous article, Streaming data from SQL Server to Kafka to Snowflake ❄️ with Kafka Connect. Setting up the Docker Compose I like standalone, repeatable, demo code. For that reason I love using Docker Compose and I embed everything in there - connector installation, the kitchen sink - the works.

Continue Reading

Counting the number of messages in a Kafka topic

There’s ways, and then there’s ways, to count the number of records/events/messages in a Kafka topic. Most of them are potentially inaccurate, or inefficient, or both. Here’s one that falls into the potentially inefficient category, using kafkacat to read all the messages and pipe to wc which with the -l will tell you how many lines there are, and since each message is a line, how many messages you have in the Kafka topic:

$ kafkacat -b broker:29092 -t mytestopic -C -e -q| wc -l
       3

Continue Reading

🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB

I had the pleasure of presenting at DataEngBytes recently, and am delighted to share with you the 🗒️ slides, 👾 code, and 🎥 recording of my ✨brand new talk✨:

🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB

Continue Reading

Learning Golang (some rough notes) - S02E09 - Processing chunked responses before EOF is reached

The server sends Transfer-Encoding: chunked data, and you want to work with the data as you get it, instead of waiting for the server to finish, the EOF to fire, and then process the data?

Continue Reading

Why JSON isn’t the same as JSON Schema in Kafka Connect converters and ksqlDB (Viewing Kafka messages bytes as hex)

I’ve been playing around with the new SerDes (serialisers/deserialisers) that shipped with Confluent Platform 5.5 - Protobuf, and JSON Schema (these were added to the existing support for Avro). The serialisers (and associated Kafka Connect converters) take a payload and serialise it into bytes for sending to Kafka, and I was interested in what those bytes look like. For that I used my favourite Kafka swiss-army knife: kafkacat.

Continue Reading

Loading CSV data into Kafka

For whatever reason, CSV still exists as a ubiquitous data interchange format. It doesn’t get much simpler: chuck some plaintext with fields separated by commas into a file and stick .csv on the end. If you’re feeling helpful you can include a header row with field names in.

order_id,customer_id,order_total_usd,make,model,delivery_city,delivery_company,delivery_address
1,535,190899.73,Dodge,Ram Wagon B350,Sheffield,DuBuque LLC,2810 Northland Avenue
2,671,33245.53,Volkswagen,Cabriolet,Edinburgh,Bechtelar-VonRueden,1 Macpherson Crossing

In this article we’ll see how to load this CSV data into Kafka, without even needing to write any code

Continue Reading

Working with JSON nested arrays in ksqlDB - example

Question from the Confluent Community Slack group:

How can I access the data in object in an array like below using ksqlDB stream

"Total": [
        {
          "TotalType": "Standard",
          "TotalAmount": 15.99
        },
{
          "TotalType": "Old Standard",
          "TotalAmount": 16,
" STID":56
        }
]

Continue Reading

Building a Telegram bot with Apache Kafka and ksqlDB

Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, you’d like to be able to query information about them at any point. Sounds fun, right? What if you didn’t need any datastore other than Apache Kafka itself to be able to do this? What if you could ingest, filter, enrich, aggregate, and query data with just Kafka?

Continue Reading

Adventures in the Cloud, Part 94: ECS

My name’s Robin, and I’m a Developer Advocate. What that means in part is that I build a ton of demos, and Docker Compose is my jam. I love using Docker Compose for the same reasons that many people do:

Spin up and tear down fully-functioning multi-component environments with ease. No bespoke builds, no cloning of VMs to preserve "that magic state where everything works"
Repeatability. It’s the same each time.
Portability. I can point someone at a docker-compose.yml that I’ve written and they can run the same on their machine with the same results almost guaranteed.

Continue Reading

Primitive Keys in ksqlDB

ksqlDB 0.7 will add support for message keys as primitive data types beyond just STRING (which is all we’ve had to date). That means that Kafka messages are going to be much easier to work with, and require less wrangling to get into the form in which you need them. Take an example of a database table that you’ve ingested into a Kafka topic, and want to join to a stream of events. Previously you’d have had to take the Kafka topic into which the table had been ingested and run a ksqlDB processor to re-key the messages such that ksqlDB could join on them. Friends, I am here to tell you that this is no longer needed!

Continue Reading

Monitoring Sonos with ksqlDB, InfluxDB, and Grafana

I’m quite a fan of Sonos audio equipment but recently had some trouble with some of the devices glitching and even cutting out whilst playing. Under the covers Sonos stuff is running Linux (of course) and exposes some diagnostics through a rudimentary frontend that you can access at http://<sonos player IP>:1400/support/review: Whilst this gives you the current state, you can’t get historical data on it. It felt like the problems were happening "all the time", but were they actually?

Continue Reading

Robin Moffatt

Robin Moffatt is a Principal DevEx Engineer at Decodable. He likes writing about himself in the third person, eating good breakfasts, and drinking good beer.