Interesting links - January 2026

Published by in Interesting Links at https://rmoff.net/2026/01/20/interesting-links-january-2026/

This is the twelfth edition of this newsletter in its current form. It’s great to see the audience for it growing, and consistently positive reception when I share it. Nice words always inspire me to carry on with it :D The substack edition (which is exactly the same content but sent out by email), is also picking up views and subscribers.

A couple of blog posts from me since the last edition of Interesting Links—both outside the usual Kafka/Flink scope:

I’ve also been firmly on-board the Claude Code/Opus 4.5 bandwagon, giggling like a child at the sheer magnitude of what it can now do. I’m going to write a blog post in its own right shortly in more detail, but if you want to marvel at what AI can do: I migrated my previous talks site (and the one before that) to this brand new one. Without writing a single line of code. Not one. Not a single byte.

So anyway. More AI blog posts to come, but for now—on to the Interesting Links!

Tip
  • 🔥 Not got time for all this? I’ve marked my top reads of the month :)

  • 📧 Want to receive this monthly round-up as an email? Subscribe to my Substack where I cross-post the same content

  • 🔗 Medium posts often skulk behind a gate, so I’ve hyperlinked to the Freedium version and included a link to the original using a ⓜ️ icon should you prefer to visit that (or if freedium goes offline).

Looking back (Reviews of 2025)… 🔗

See below for AI-specific links about 2025.

…and looking forward (predictions for 2026) 🔗

See below for AI-specific links about 2026.

  • Oxide & Friends (Bryan Cantrill & Adam Leventhal, plus Simon Willison, Steve Klabnik, and Ian Grunert).

  • Ben Lorica - Data Engineering in 2026: What Changes?

  • Paul Dix - 2026: The Great Engineering Divergence.

  • 🔥 Joe Reis - 2026 - General Thoughts on What’s Ahead.

  • Ian Cook (Columnar) - 10 Predictions for Data Infrastructure in 2026.

  • Simon Späti - Data Engineering: Trends and Predictions.

  • Darren Wood - Data predictions for 2026.

Kafka and Event Streaming 🔗

  • 🔥 Another good blog post from Stefan Kecskes, looking at Dead Letter Queue (DLQ) handling in Kafka. For a hit of nostalgia, here’s a blog post that I wrote in 2019 also looking at DLQs in Kafka Connect.

  • How many TUIs for Kafka is too many? Well, we’re not there yet, and Hoa Nguyen brings us LazyKafka. Out of curiosity I did a quick Google and found seven TUIs in total, including this one: kafka2i / kaftui / ktea / yozefu / kaskade / ktui

  • Into the YAKR (Yet Another Kafka Replacement) category comes KafScale, an Apache 2.0 licensed Kafka-on-S3 broker written by Alexander Alten. It has support for Iceberg and SQL.

  • Good write-up from Sky Kistler at Reddit on how they migrated their 500+ EC2-based Kafka brokers to k8s-hosted Strimzi, with some impressive numbers - 500+ brokers serving tens of millions of messages per second and storing over a petabyte in live topic data.

  • 🔥 Gwen Shapira’s 2017 QCon talk Streaming Microservices: Contracts & Compatibility is one that I keep coming back to over the years. Loosely-coupled services need contracts; just because you’re using Kafka and not REST, it doesn’t mean you escape that truth.

  • Tansu is a replacement Kafka broker written in Rust, with the interesting twist that it uses Postgres, S3, or SQLite for its storage. There have been some interesting recent blog posts, including two on internals performance tuning (1 2), as well as deploying it on a t3.micro instance on AWS. And, if you enjoy lengthy conversations, you’ll want to check out this 3.5hr interview between Stanislav Kozlovski (a.k.a. "2 Minute Streaming") and the author of Tansu, Peter Morgan.

  • A good roundup from Kafka PMC Chair Mickael Maison looking at some of the milestones in the Kafka project in 2025.

Analytics 🔗

Stream Processing 🔗

  • A deep-dive (of course) from regular Anton Borisov, looking at Flink 2.2’s improvements to Delta Join, including support for CDC Upserts.

  • Riskified’s Gal Krispel has an interesting talk on YouTube about using Flink SQL and DataStream together to overcome some issues they found when using SQL alone. The talk is in Hebrew but the auto-translation of the captions by YouTube is good enough to follow along.

  • Somewhat of an interloper to this blog, but Apache Spark has just released version 4.1. This blog post covers some of the new features, including Spark Declarative Pipelines (SDP), as well as lower-latency streaming capabilities with "Real-Time Mode" (RTM) implemented with SPARK-53736.

  • 🔥 Jonas Geiregat has a useful discussion of managing Kafka Streams and its memory usage.

Data Platforms, Architectures, and Modelling 🔗

  • 🔥 Jesus Gomez at Fresha has a good blog post looking at some of the required changes to their data modelling approach when migrating from Snowflake to StarRocks.

  • Dejan Menges has written up a two part series about Vinted’s event driven platform. It’s fairly high-level, and I’m hoping there’ll be a part 3 (if not more) that takes a deeper dive at some of the specifics.

  • Saubhagya Awaneesh and colleagues at Grab have published details of their real-time customer data platform built on technologies including StarRocks, Flink, and Kafka.

  • 🔥 Mark Rittman has an insightful post looking at a fifty-year cycle of tools promising to democratise data work, each delivering genuine value while leaving the fundamental need for specialists stubbornly intact.

  • We’ve been trying to prise Excel from our users' hands for decades with no success, regardless of the shininess of the replacement. Jelle De Vleminck lays out an argument for why this is so, and why it’s perhaps a misguided goal.

  • Joe Reis on Data Identity Politics and The Kimball vs. Inmon War (Bill Inmon recently re-published some of his material, always worth reading).

  • The hype around Data Mesh may have subsided, but it’s still an interesting concept. Sebastian Werner and his colleagues at ThoughtWorks have taken a look at where data mesh is at in 2026.

Data Engineering, Pipelines, and CDC 🔗

Open Table Formats (OTF), Catalogs, Lakehouses etc. 🔗

  • As fun as it is importing a ton of Hadoop dependencies every time we want to use Parquet /s there might be a better alternative—and my colleague Gunnar Morling is building a proof of concept called Hardwood as a minimal dependency implementation of Parquet. (I also like the Parquet / Hardwood wordplay ;) )

  • Catalogs are, for me, one of the most confusing aspects of the data platform ecosystem. The term is so overloaded, and numerous products in the space overlap in functionality too. Hari Thatavarthy does a good job of explaining the evolving role of the data catalog. For a Flink-specific spin, check out my primer on catalogs in Flink SQL.

Iceberg 🔗

Delta Lake and Hudi 🔗

  • I enjoyed this post from Prem Vishnoi in which they examined the assumption that Delta is solely a "Spark thing", and look at writing to Delta from Flink and Kafka directly. If this is a thing you’re wanting to do, you might also be interested in my previous article about writing to Delta from Flink SQL.

  • 🔥 Hudi originated from Uber, and this in-depth blog post from Prashant Wason and colleagues at Uber describes in detail its use and deployment architecture, along with some impressive figures—6 trillion rows ingested per day, 350 PB stored, etc.

  • A couple of interesting write-ups of how Hudi is used in data platforms, from Zupee and Funding Circle.

RDBMS 🔗

General Data Stuff 🔗

  • SQLNet is a social media platform created by Vladyslav Len, in which all interactions are by SQL. Seriously. Try it out!

  • Orchestra’s Hugo Lu posits that Snowflake and Databricks are hitting their market ceiling.

  • Squirreling is a ~9 KB SQL engine with zero external dependencies for running SQL queries in the browser. Their blog post explains the background to it, and why tools like DuckDB-Wasm alone aren’t sufficient.

  • Details from Phillip LeBlanc at SpiceAI on their use of Apache DataFusion.

  • Jon Anderson explores FoundationDB (a distributed K/V database) in this blog post.

  • The team behind Responsive have pivoted from Kafka Streams to launch OpenData, "a collection of open source databases built on a common, object-native storage and infrastructure foundation."

  • A fun write-up from Tomás Senart at Axiom detailing an optimisation project on EventDB, their in-house database, eventually getting it to deliver 178 billion rows per second throughput.

AI 🔗

I warned you previously…this AI stuff is here to stay, and it’d be short-sighted to think otherwise. As I read and learn more about it, I’m going to share interesting links (the clue is in the blog post title) that I find—whilst trying to avoid the breathless hype and slop.

nano banana one-shot. prompt: "create meme: i iz now agentic"
nano banana one-shot. prompt: "create meme: i iz now agentic"

Well, gosh darnit. Didn’t this just blow up in the last month? Whilst Sonnet 4.5 was just trundling along giving ammo to the AI deniers, Opus 4.5 has come and blown things out of the water. If you’ve no idea what I’m talking about, have a read / listen to Casey Newton’s recent thoughts.

In an industry in which the term super-exciting has been devalued, what is happening now is, genuinely, SUPER-SUPER-EXCITING.

I’ll open with DHH’s tweet:

Not a fan of DHH? How about Charity Majors: 2025 was for AI what 2010 was for cloud.

Stop and think about that. In the late 2000s cloud was this thing that of course the vendors were trying to sell (vendors gonna vend), and some people got, but most people sniffed at or ignored. Now 15 years later…who’s ignoring cloud?

Anyway, I’ll save my pontificating for another blog post. But there are a ton of interesting links to share with you about AI, so here they are. You’ll have to excuse the lack of narrative on each one; there are just too many this month :)

Looking forward & looking back 🔗

Strategy & Ideas 🔗

Using AI—Product 🔗

Using AI—Engineering 🔗

And finally… 🔗

Nothing to do with data, but stuff that I’ve found interesting or has made me smile.

Nerd 🔗


Tip

TABLE OF CONTENTS