📌 🎁 A collection of Kafka-related talks 💝
Here’s a collection of Kafka-related talks, just for you.
Each one has 🍿🎥 a recording, 📔 slides, and 👾 code to go and try out.
Here’s a collection of Kafka-related talks, just for you.
Each one has 🍿🎥 a recording, 📔 slides, and 👾 code to go and try out.
I’m convinced that a developer advocate can be effective remotely. As a profession, we’ve all spent two years figuring out how to do just that. Some of it worked out great. Some of it, less so.
I made the decision during COVID to stop travelling as part of my role as a developer advocate. In this article, I talk about my experience with different areas of advocacy done remotely.
I recently started writing an abstract for a conference later this year and realised that I’m not even sure if I want to do it. Not the conference—it’s a great one—but just the whole up on stage doing a talk thing. I can’t work out if this is just nerves from the amount of time off the stage, or something more fundamental to deal with.
This blog is written in Asciidoc, built using Hugo, and hosted on GitHub Pages. I recently wanted to share the draft of a post I was writing with someone and ended up exporting a local preview to a PDF - not a great workflow! This blog post shows you how to create an automagic hosted preview of any draft content on Hugo using GitHub Actions.
This is useful for previewing and sharing one’s own content, but also for making good use of GitHub as a collaborative platform - if someone reviews and amends your PR the post gets updated in the preview too.
Kafka Summit London IS BACK! After COVID spoiled everyone’s fun and fundamentally screwed everything up for the past two years, I cannot wait to be back at an in-person conference. At the last Kafka Summit in the beforetimes (San Francisco, 2019) some of us got together for a run (or walk) across the GoldenGate bridge. I can’t promise quite the same views, but I thought it would be fun to do something similar when we meet in London later this month.
This is the software counterpart to my previous article in which I looked at my workstation’s hardware setup. Some of these are unique or best-of-breed, others may have been sherlocked but I stick with them anyway :)
There’s a bunch of improvements in the works for how ksqlDB handles code deployments and migrations. For now though, for deploying queries there’s the option of using headless mode (which is limited to one query file and disables subsequent interactive work on the server from a CLI), manually running commands (yuck), or using the REST endpoint to deploy queries automagically. Here’s an example of doing that.
The FilePulse connector from Florian Hussonnois is a really useful connector for Kafka Connect which enables you to ingest flat files including CSV, JSON, XML, etc into Kafka. You can read more it in its overview here. Other connectors for ingested CSV data include kafka-connect-spooldir (which I wrote about previously), and kafka-connect-fs.
Here I’ll show how to use it to stream CSV data into a topic in Confluent Cloud. You can apply the same config pattern to any other secured Kafka cluster.
Using ksqlDB in Confluent Cloud makes things a whole bunch easier because now you just get to build apps and streaming pipelines, instead of having to run and manage a bunch of infrastructure yourself.
Once you’ve got ksqlDB provisioned on Confluent Cloud you can use the web-based editor to build and run queries. You can also connect to it using the REST API and the ksqlDB CLI tool. Here’s how.
The ActiveMQ source connector creates a Struct holding the value of the message from ActiveMQ (as well as its key). This is as would be expected. However, you can encounter challenges in working with the data if the ActiveMQ data of interest within the payload is complex. Things like converters and schemas can get really funky, really quick.
The Kafka Connect JDBC Sink can be used to stream data from a Kafka topic to a database such as Oracle, Postgres, MySQL, DB2, etc.
It supports many permutations of configuration around how primary keys are handled. The documentation details these. This article aims to illustrate and expand on this.
I got the error SQLSyntaxErrorException: BLOB/TEXT column 'MESSAGE_KEY' used in key specification without a key length
with Kafka Connect JDBC Sink connector (v10.0.2) and MySQL (8.0.23)
ksqlDB is a fantastically powerful tool for processing and analysing streams of data in Apache Kafka. But sometimes, you just want a quick way to profile the data in a topic in Kafka. I wrote about this previously with a convoluted (but effective) set of bash commands pipelined together to perform a GROUP BY
on data. Then someone introduced me to visidata
, which makes it all a lot quicker!
Whilst Apache Kafka is an event streaming platform designed for, well, streams of events, it’s perfectly valid to use it as a store of data which perhaps changes only occasionally (or even never). I’m thinking here of reference data (lookup data) that’s used to enrich regular streams of events.
You might well get your reference data from a database where it resides and do so effectively using CDC - but sometimes it comes down to those pesky CSV files that we all know and love/hate. Simple, awful, but effective. I wrote previously about loading CSV data into Kafka from files that are updated frequently, but here I want to look at CSV files that are not changing. Kafka Connect simplifies getting data in to (and out of) Kafka but even Kafka Connect becomes a bit of an overhead when you just have a single file that you want to load into a topic and then never deal with again. I spent this afternoon wrangling with a couple of CSV-ish files, and building on my previous article about neat tricks you can do in bash with data, I have some more to share with you here :)
Some people learn through doing - and for that there’s a bunch of good ksqlDB tutorials here and here. Others may prefer to watch and listen first, before getting hands on. And for that, I humbly offer you this little series of videos all about ksqlDB. They’re all based on a set of demo scripts that you can run for yourself and try out.
🚨 Make sure you subscribe to my YouTube channel so that you don’t miss more videos like these!
One of the fun things about working with data over the years is learning how to use the tools of the day—but also learning to fall back on the tools that are always there for you - and one of those is bash and its wonderful library of shell tools.
There’s an even better way than I’ve described here, and it’s called visidata . I’ve written about it more over here.
|
I’ve been playing around with a new data source recently, and needed to understand more about its structure. Within a single stream there were multiple message types.
tl;dr: specify the --user root
argument:
docker exec --interactive \
--tty \
--user root \
--workdir / \
container-name bash
Confluent Cloud is not only a fully-managed Apache Kafka service, but also provides important additional pieces for building applications and pipelines including managed connectors, Schema Registry, and ksqlDB. Managed Connectors are run for you (hence, managed!) within Confluent Cloud - you just specify the technology to which you want to integrate in or out of Kafka and Confluent Cloud does the rest.