Interesting links - September 2025

Published by in Interesting Links at https://rmoff.net/2025/09/30/interesting-links-september-2025/

Sneaking it in just before the end of the month!

It’s a bumper set of links this month—I started with an original backlog of 125 links to get through. Some fell by the wayside, but plenty of others (78, to be precise) made the cut. With no further ado, let’s get cracking!

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

Data Engineering and Architecture 🔗

AI 🔗

Note

Wait, what’s this? A new section this month, all about AI? Is Robin now drinking the hype-juice too?

Don’t worry, this isn’t a rebranding of rmoff.net to ai-ai-ai-vc-money.plz (at least not yet). Whilst I’ve been an avid user of AI for some time now (mostly through Raycast's AI features), I’ve started to take an interest in understanding it in more detail. This month I wrote up a few note-taking articles as I learn more about MCP, Models, RAG, and some general rambling and corrections.

AI is important, and it’s here to stay. To the nay-sayers who scoff at the errors it makes and laugh at the idea that it can do our jobs…you are missing the point. Some of the attitudes I’ve encountered give me heavy vibes of Oracle DBAs 15 years ago who derided the idea of "The Cloud". That came to pass, completely upending how we build things—and so will AI. (We’ll ignore Blockchain for now…not every hype turns into reality 😉).

🔥 Sam Newman posted an excellent note on LinkedIn, which begins:

To those of you who are deeply pessimistic around the use of AI in software delivery, the old quote from John Maynard Keynes comes to mind:

"The market can remain irrational longer than you can remain solvent".

Go read the rest of the post (it’s not long). In addition, Scott Werner’s article 🔥 The Only Skill That Matters Now puts it even more clearly into focus, with a nice analogy about how "skating to the puck" is no longer a viable strategy (tl;dr the rate of change in AI means you have no idea where the puck will even be).

The impact of AI going to be felt universally. Here are some interesting articles that I’ve come across this month about it in the sphere of data:

  • A paper titled Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First discussing the ways in which LLMs want to retrieve data and how we might change how we model data to support that. Murat Demirbas has a nice analysis and commentary on the paper.

  • A summary of a presentation given by Xintong Song on the new Flink Agents project (a formal sub-project of Apache Flink itself)

  • MCP servers are a nice way to provide standard interoperability between LLMs and other computer systems. I’ve not these ones out specifically but the idea of being able to chat to Claude about Flink, Kafka, and Confluent Cloud certainly sounds a cool idea :)

  • A good account from Pedro Nascimento of why what sounds like a simple enough idea ("build an AI-powered data analyst") is a lot more complex than you may think.

Iceberg (and other OTF/Data Lake stuff) 🔗

I mean, there may be some Delta Lake, Hudi, and DuckLake in here…but in my corner of the internet it’s Iceberg all the way…

Kafka and Event Streaming 🔗

Stream Processing 🔗

General Data Stuff 🔗

Data in Action 🔗

  • Details of how Netflix built a Write-Ahead-Log (WAL) to make their data platform more resilient.

  • Cursor migrated from AWS Aurora Limitless to PlanetScale.

  • Wix saved 50% of their data platform costs by moving their Spark workloads from EMR to EMR on EKS—they cover why and how in this two part series.

  • dbt in action at BlaBlaCar.

  • 🔥 Netflix built their Muse analytics platform originally on Druid with offline Spark, but in order to meet performance requirements moved to using their homegrown Hollow tool for pre-aggregating data, along with Druid still plus Spark and Iceberg offline.

  • Some details of the data architecture at Decathlon, and how they use Polars.

  • How Stripe use Apache Flink for real-time analytics.

  • Details of how Uber replicate between their two HDFS-based datalakes using HiveSync.

  • 🔥 A nice under-the-covers look at Fresha’s data lakehouse architecture from Paritosh Anand.

  • Chick-fil-A’s Caleb Lampert describes their Data Asset Certification Framework (and its relationship to soup…)

  • Airbnb built their own K/V store called Mussel—read about the original V1 and the re-architected V2.

  • Metagenomi write about how they use LanceDB on S3.

  • A write-up of a talk given by Xiaotong Jiang from Databricks on how they approach OLTP database performance and optimisation in a multi-tenant architecture.

  • Details of how Bazaarvoice migrated from RDS MySQL to AWS Aurora.

  • 🔥 A deep-dive on how Motherduck is built by Stephanie Wang (previously a founding engineer at Motherduck).

  • Practical tips from Sadeq Dousti at Trade Republic on the implementation of the outbox pattern, based on their experiences.

  • How Grab use Pinot (and Kafka and Flink) for low-latency analytics.

Newsletters 🔗

If you can’t wait for this monthly round-up of links, you might like the following:


TABLE OF CONTENTS