Mar 10, 2025

Data Wrangling with Flink SQL

The UK Government publishes a lot of its data as open feeds. One that I keep coming back to is the Environment Agency’s flood-monitoring API that gives access to an estate of sensors that provide information about data such as river levels and rainfall.

The data is well-structured and provided across three primary API endpoints. In this blog article I’m going to show you how I use Flink SQL to explore and wrangle these into the kind of form from which I am then going to build a streaming pipeline using them.

Mar 6, 2025

Joining two streams of data with Flink SQL

There was a useful question on the Apache Flink Slack recently about joining data in Flink SQL:

How can I join two streams of data by id in Flink, to get a combined view of the latest data?

Mar 3, 2025

How to explode nested arrays with Flink SQL

Let’s imagine we’ve got a source of data with a nested array of multiple values. The data is from an IoT device. Each device has multiple sensors, each sensor provides a reading.

Feb 28, 2025

Exploring UK Environment Agency data in DuckDB and Rill

The UK Environment Agency publishes a feed of data relating to rainfall and river levels. As a prelude to building a streaming pipeline with this data, I wanted to understand the model of it first.

Feb 27, 2025

DuckDB tricks - renaming fields in a SELECT * across tables

I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted. The issue was, the same field name existing in more than one table. This meant that in the results from the query, it wasn’t clear which field came from which table:

Feb 3, 2025

Interesting links - February 2025

Here’s a bunch of interesting links and articles about data that I’ve come across recently.

Dec 11, 2024

Disabling Vale Linting Selectively in Asciidoc

I’m a HUGE fan of Docs as Code in general, and specifically tools like Vale that lint your prose for adherence to style rule.

One thing that had been bugging me though was how to selectively disable Vale for particular sections of a document. Usually linting issues should be addressed at root: either fix the prose, or update the style rule. Either it’s a rule, or it’s not, right?

Sometimes though I’ve found a need to make a particular exception to a rule, or simply needed to skip linting for a particular file. I was struggling with how to do this in Asciidoc. Despite the documentation showing how to, I could never get it to work reliably. Now I’ve taken some time to dig into it, I think I’ve finally understood :)

Sep 2, 2024

Current 2024 - 5k Fun Run (or Walk)

At Current 24 a few of us will be going for an early run (or walk) on Tuesday morning. Everyone is very welcome!

May 22, 2024

How I Try To Keep Up With The Data Tech World (A List of Data Blogs)

I do my best to try and keep, if not abreast of, then at least aware of what’s going on in the world of data. That includes RDBMS, Event streaming, stream processing, open source data projects, data engineering, object storage, and more. If you’re interested in the same, then you might find this blog useful, because I’m sharing my sources :)

May 3, 2024

ngrok DNS headaches

Let’s not bury the lede: it was DNS. However, unlike the meme ("It’s not DNS, it’s never DNS. It was DNS"), I didn’t even have an inkling that DNS might be the problem.

I’m writing a new blog about streaming Apache Kafka data to Apache Iceberg and wanted to provision a local Kafka cluster to pull data from remotely. I got this working nicely just last year using ngrok to expose the broker to the interwebz, so figured I’d use this again. Simple, right?

Nope.

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓