I write and speak lots about Kafka, and get a fair few questions from this. The most common question is actually nothing to do with Kafka, but instead:
How do you make those cool diagrams?
I wrote about this originally last year but since then have evolved my approach. I’ve now pretty much ditched Paper, in favour of Concepts. It was recommended to me after I published the previous post.
This week I was scheduled in to a couple of meetups, in Vienna and Munich. Flying is an inevitable part of travel since I also happen to like being home seeing my family and airplanes are usually the quickest way to make this happen. I don’t particularly enjoy flying, and there’s the environmental impact of it too—so when I realised that Vienna and Munich are relatively close to each other I looked at getting the train.
Kafka Connect has as REST API through which all config should be done, including removing connectors that have been created. Sometimes though, you might have reason to want to manually do this—and since Kafka Connect running in distributed mode uses Kafka as its persistent data store, you can achieve this by manually writing to the topic yourself.
Here’s a hacky way to automatically restart Kafka Connect connectors if they fail. Restarting automatically only makes sense if it’s a transient failure; if there’s a problem with your pipeline (e.g. bad records or a mis-configured server) then you don’t gain anything from this. You might want to check out Kafka Connect’s error handling and dead letter queues too.
Kafka Connect configuration is easy - you just write some JSON! But what if you’ve got credentials that you need to pass? Embedding those in a config file is not always such a smart idea. Fortunately with KIP-297 which was released in Apache Kafka 2.0 there is support for external secrets. It’s extendable to use your own ConfigProvider, and ships with its own for just putting credentials in a file - which I’ll show here. You can read more here.
Kafka Connect exposes a REST interface through which all config and monitoring operations can be done. You can create connectors, delete them, restart them, check their status, and so on. But, I found a situation recently in which I needed to delete a connector and couldn’t do so with the REST API. Here’s another way to do it, by amending the configuration Kafka topic that Kafka Connect in distributed mode uses to persist configuration information for connectors. Note that this is not a recommended way of working with Kafka Connect—the REST API is there for a good reason :)
Kafka Connect is a API within Apache Kafka and its modular nature makes it powerful and flexible. Converters are part of the API but not always fully understood. I’ve written previously about Kafka Connect converters, and this post is just a hands-on example to show even further what they are—and are not—about.
When you run Kafka Connect in distributed mode it uses a Kafka topic to store the offset information for each connector. Because it’s just a Kafka topic, you can read that information using any consumer.
Prompted by a question on StackOverflow, the requirement is to take a series of events related to a common key and for each key output a series of aggregates derived from a changing value in the events. I’ll use the data from the question, based on ticket statuses. Each ticket can go through various stages, and the requirement was to show, per customer, how many tickets are currently at each stage.
I’ve hit these errors when creating a connector with Debezium against MySQL a couple of times now, and seen them asked about on StackOverflow too. In essence it means that you’ve not configured MySQL correctly for Debezium to be able to connect to it.
Before you can drop a stream or table that’s populated by a query in KSQL, you have to terminate any queries upon which the object is dependent. Here’s a bit of jq & xargs magic to terminate all queries that are currently running
This post is the companion to an earlier one that I wrote about conference abstracts. In the same way that the last one was inspired by reviewing a ton of abstracts and noticing a recurring pattern in my suggestions, so this one comes from reviewing a bunch of slide decks for a forthcoming conference. They all look like good talks, but in several cases these great talks are fighting to get out from underneath the deadening weight of slides.
Herewith follows my highly-opinionated, fairly-subjective, and extremely-terse advice and general suggestions for slide decks. You can also find relating ramblings in this recent post too. My friend and colleague Vik Gamov also wrote a good post on this same topic, and linked to a good video that I’d recommend you watch.
I’ve written quite a few talks over the years, but usually as a side-line to my day job. In my role as a Developer Advocate, talks are part of What I Do, and so I can dedicate more time to it. A lot of the talks I’ve done previously have evolved through numerous iterations, and with a new talk to deliver for the "Spring Season" of conferences, I thought it would be interesting to track what it took from concept to actual delivery.
I began travelling for my job when my first child was three months old. But don’t mistake correlation for causation…it wasn’t the broken nights' sleep that forced me onto the road, but an excellent job opportunity that seemed worth the risk. Nearly eight years later and I’m in a different job but still with a bunch of travel involved. How much I travel has varied. It’s tended to average around 30%, but has peaked at way more than that.
By default Kafka Connect sends its output to stdout, so you’ll see it on the console, Docker logs, or wherever. Sometimes you might want to route it to file, and you can do this by reconfiguring log4j. You can also change the configuration to get more (or less) detail in the logs by changing the log level.
Finding the log configuration file The configuration file is called connect-log4j.properties and usually found in etc/kafka/connect-log4j.
A script I’d batch-run on my Markdown files had inserted a UTF-8 non-breaking-space between Markdown heading indicator and the text, which meant that # My title actually got rendered as that, instead of an H3 title.
Looking at the file contents, I could see it wasn’t just a space between the # and the text, but a non-breaking space.