Streaming data from Kafka to Elasticsearch is easy with Kafka Connect - you can see how in this tutorial and video.
One of the things that sometimes causes issues though is how to get location data correctly indexed into Elasticsearch as geo_point fields to enable all that lovely location analysis. Unlike data types like dates and numerics, Elasticsearch’s Dynamic Field Mapping won’t automagically pick up geo_point data, and so you have to do two things:
There was a good question on StackOverflow recently in which someone was struggling to find the appropriate ksqlDB DDL to model a source topic in which there was a variable number of fields in a STRUCT.
Prompted by a question on StackOverflow I thought I’d take a quick look at setting up ksqlDB to ingest CDC events from Microsoft SQL Server using Debezium. Some of this is based on my previous article, Streaming data from SQL Server to Kafka to Snowflake ❄️ with Kafka Connect.
Setting up the Docker Compose I like standalone, repeatable, demo code. For that reason I love using Docker Compose and I embed everything in there - connector installation, the kitchen sink - the works.
There’s ways, and then there’s ways, to count the number of records/events/messages in a Kafka topic. Most of them are potentially inaccurate, or inefficient, or both. Here’s one that falls into the potentially inefficient category, using kafkacat to read all the messages and pipe to wc which with the -l will tell you how many lines there are, and since each message is a line, how many messages you have in the Kafka topic:
For whatever reason, CSV still exists as a ubiquitous data interchange format. It doesn’t get much simpler: chuck some plaintext with fields separated by commas into a file and stick .csv on the end. If you’re feeling helpful you can include a header row with field names in.
Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, you’d like to be able to query information about them at any point. Sounds fun, right?
What if you didn’t need any datastore other than Apache Kafka itself to be able to do this? What if you could ingest, filter, enrich, aggregate, and query data with just Kafka?
My name’s Robin, and I’m a Developer Advocate. What that means in part is that I build a ton of demos, and Docker Compose is my jam. I love using Docker Compose for the same reasons that many people do:
Spin up and tear down fully-functioning multi-component environments with ease. No bespoke builds, no cloning of VMs to preserve "that magic state where everything works"
Repeatability. It’s the same each time.
Portability. I can point someone at a docker-compose.yml that I’ve written and they can run the same on their machine with the same results almost guaranteed.
ksqlDB 0.7 will add support for message keys as primitive data types beyond just STRING (which is all we’ve had to date). That means that Kafka messages are going to be much easier to work with, and require less wrangling to get into the form in which you need them. Take an example of a database table that you’ve ingested into a Kafka topic, and want to join to a stream of events. Previously you’d have had to take the Kafka topic into which the table had been ingested and run a ksqlDB processor to re-key the messages such that ksqlDB could join on them. Friends, I am here to tell you that this is no longer needed!
I’m quite a fan of Sonos audio equipment but recently had some trouble with some of the devices glitching and even cutting out whilst playing. Under the covers Sonos stuff is running Linux (of course) and exposes some diagnostics through a rudimentary frontend that you can access at http://<sonos player IP>:1400/support/review:
Whilst this gives you the current state, you can’t get historical data on it. It felt like the problems were happening "all the time", but were they actually?
Prompted by a question on StackOverflow I had a bit of a dig into how windows behave in ksqlDB, specifically with regards to their start time. This article shows also how to create test data in ksqlDB and create data to be handled with a timestamp in the past.
For a general background to windowing in ksqlDB see the excellent docs.
The nice thing about recent releases of ksqlDB/KSQL is that you can create and populate streams directly with CREATE STREAM and INSERT INTO respectively.
This was prompted by a question on StackOverflow to which I thought the answer would be straightforward, but turned out not to be so. And then I got a bit carried away and ended up with a nice example of how you can handle schema-less data coming from a system such as RabbitMQ and apply a schema to it.
Note This same pattern for ingesting bytes and applying a schema will work with other connectors such as MQTT What?
In this post I want to build on my previous one and show another use of the Syslog data that I’m capturing. Instead of looking for SSH attacks, I’m going to analyse the behaviour of my networking components.
Note You can find all the code to run this on GitHub. Getting Syslog data into Kafka As before, let’s create ourselves a syslog connector in ksqlDB:
I’ve written previously about ingesting Syslog into Kafka and using KSQL to analyse it. I want to revisit the subject since it’s nearly two years since I wrote about it and some things have changed since then.
ksqlDB now includes the ability to define connectors from within it, which makes setting things up loads easier.
You can find the full rig to run this on GitHub.
Create and configure the Syslog connector To start with, create a source connector:
This is a short summary discussing what the options are for integrating Oracle RDBMS into Kafka, as of December 2018 (refreshed June 2020). For a more detailed background to why and how at a broader level for all databases (not just Oracle) see this blog and this talk.
What techniques & tools are there? Franck Pachot has written up an excellent analysis of the options available here.
As of June 2020, this is what the line-up looks like: