Jun 27, 2025

Interesting links - June 2025

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

Jun 24, 2025

Writing to Apache Iceberg on S3 using Flink SQL with Glue catalog

In this blog post I’ll show how you can use Flink SQL to write to Iceberg on S3, storing metadata about the Iceberg tables in the AWS Glue Data Catalog. First off, I’ll walk through the dependencies and a simple smoke-test, and then put it into practice using it to write data from a Kafka topic to Iceberg.

Jun 2, 2025

Digging into Ducklake

After a week’s holiday ("vacation", for y’all in the US) without a glance at anything work-related, what joy to return and find that the DuckDB folk have been busy, not only with the recent 1.3.0 DuckDB release, but also a brand new project called DuckLake.

Here are my brief notes on DuckLake.

May 23, 2025

Interesting links - May 2025

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

May 20, 2025

Exploring Joins and Changelogs in Flink SQL

SQL. Three simple letters. Ess Queue Ell. /ˌɛs kjuː ˈɛl/.

In the data world they bind us together, yet separate us.

As the saying goes, England and America are two countries divided by the same language, and the same goes for the batch and streaming world and some elements of SQL.

May 2, 2025

🏃🚶 The unofficial Current London 2025 Run/Walk 🏃🚶

Another year, another Current—another 5k run/walk for anyone who’d like to join!

Apr 25, 2025

It’s Time We Talked About Time: Exploring Watermarks (And More) In Flink SQL

Whether you’re processing data in batch or as a stream, the concept of time is an important part of accurate processing logic.

Because we process data after it happens, there are a minimum of two different types of time to consider:

When it happened, known as Event Time
When we process it, known as Processing Time (or system time or wall clock time)

Apr 22, 2025

Interesting links - April 2025

So. Many. Interesting. Links. Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

Mar 25, 2025

Confluent Cloud for Apache Flink - Exploring the API

Confluent Cloud for Apache Flink gives you access to run Flink workloads using a serverless platform on Confluent Cloud. After poking around the Confluent Cloud API for configuring connectors I wanted to take a look at the same for Flink.

Using the API is useful particularly if you want to script a deployment, or automate a bulk operation that might be tiresome to do otherwise. It’s also handy if you just prefer living in the CLI :)

Mar 24, 2025

Interesting links - March 2025

The problem with publishing February’s interesting links at the beginning of the month and now getting around to publishing March’s at the end is that I have nearly two months' worth of links to share 😅 So with no further ado, let’s crack on.

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓