Interesting links - October 2025

Published by in Interesting Links at https://rmoff.net/2025/10/31/interesting-links-october-2025/

What with Current NOLA 2025 happening this week, and some very last minute preparations for the demo at the keynote on day 2, this month’s links roundup is pushing it right up to the wire :) The demo was pretty cool, and finally I have a good example of how this AI stuff actually fits into a workflow ;) I’ll write it up as a blog post (or two, probably)—stay tuned!

Some self-promotion to begin with:

  • This month a couple of colleagues and I launched Flink Watermarks…WTF. It’s an interactive explainer about watermarks in Apache Flink. Try it out and let me know what you think.

    • Oh, and I even designed some stickers for it!

  • I gave a talk about Blog Writing for Developers - check out the link for slides and audio recording

  • I was a guest on the Confluent Developer podcast - 🎥 video here, 🎧 audio here

With that, on with the interesting links!

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

Kafka and Event Streaming 🔗

Stream Processing 🔗

Streaming Analytics 🔗

Analytics 🔗

Data Platforms, Architectures, and Modelling 🔗

Data Engineering, Pipelines, and CDC 🔗

  • Debezium 3.4.0.Alpha1 has been released, which includes support for Postgres 18, OpenLineage output from Debezium Server, improvements to the Oracle LogMiner support, and more.

  • What’s the best way to add a new table in Debezium? Fiore Mario Vitale explains it here, including things to watch out for.

  • I enjoyed reading this one, as my assumption about partitioning is exactly what Kirill Bobrov says here is not the way to do it (and explains an alternative approach instead).

  • 🔥 It can’t really be a month of interesting links without at least one from Jack Vanlightly, and this month we have three :) This post is this well-reasoned argument as to why he is not a fan of zero-copy for getting data from Kafka to Iceberg.

  • A two-part series from Kakao describing their implementation and troubleshooting of a CDC pipeline with Kafka Connect from Postgres to Elasticsearch. It’s in Korean but if you open it in Chrome etc the in-browser translation tool will work wonders :)

  • A decent comparison of the open-source data ingestion frameworks (Flink/Kafka Connect/Spark) from Shiyan Xu at Onehouse. If you notice a recurring theme of Spark cost and performance optimisation then I’m sure it’s not because Onehouse have their own tool to fix that ;)

  • A summary from ByteByteGo on how Pinterest use CDC.

  • Fresha have burst onto the data engineering blogging scene in recent months, sharing all sorts of excellent details about their platforms. This post from Emiliano Mancuso explains why they moved from JSON to Avro in their CDC pipelines to Snowflake.

Open Table Formats (OTF), Catalogs, Lakehouses etc. 🔗

RDBMS 🔗

General Data Stuff 🔗

AI 🔗

I warned you last month…this AI stuff is here to stay, and it’d be short-sighted to think otherwise. As I read and learn more about it, I’m going to share interesting links (the clue is in the blog post title) that I find—whilst trying to avoid the breathless hype and slop.

  • I wrote a post trying to get my head around what we mean by Agents.

  • Basic Memory is a very cool MCP server that integrates with your AI tool and acts as a memory of your conversations, storing the information locally in Markdown. It integrates very neatly with Obsidian. I’m a big fan.

  • Confluent announced a bunch of neat stuff at Current this week including a real time context engine and streaming agents. Product blog posts are m’kay I guess but I always like to see the hands-on detail, and so I enjoyed reading my colleague Yash Anand’s example of building with streaming agents.

  • 🔥 Very cool talk (video / slides) from Ty Smith and Adam Huda with real-world examples of how Uber’s developers are using AI and what benefits they’re seeing.

  • Apache Flink Agents is a sub-project of Apache Flink, and they just had their first release.

  • Claude Skills are the latest hawtness (at least until the next thing comes along tomorrow), and Gordon Murray has published a set of them with support for technologies including Flink, Fluss, and Iceberg.

  • As well as changing how we get things done, AI is probably going to change how we build platforms too. Ananth Packkildurai has a good analysis of two papers looking at how Agents use data and how systems might be better designed for that, and Ciro Greco looks at how Agents involved in carrying out data engineering tasks might drive platform requirements.

And finally… 🔗

Nothing to do with data, but stuff that I’ve found interesting or has made me smile.

Think 🔗

Tool 🔗

  • I used freedium.cfd in previous editions of this series, and unfortunately it’s gone offline. scribe.rip is similar in concept—read Medium articles, without having to go to Medium.com (because, paywall, etc). I’m not going to use it on the links in this blog post (like I did with freedium.cfd) because everything breaks if/when it goes offline.

  • time.is is a very useful site that displays the current time for any timezone. It’s got a lovely clean interface, and a neat UX where you can just append the timezone to the URL: https://time.is/gmt, https://time.is/pt, etc.

Nerd 🔗


Note

TABLE OF CONTENTS