Interesting links - February 2025

Here’s a bunch of interesting links and articles about data that I’ve come across recently.

🔥 Not got time for all this? I’ve marked my top reads of the month :)
📧 Want to receive this monthly round-up as an email? Subscribe to my Substack where I cross-post the same content
🔗 Medium posts often skulk behind a gate, so I’ve hyperlinked to the Freedium version. You’ll see [Medium ↗] next to each link if you prefer the original.

Martin Kleppmanm’s seminal talk from 2015, Turning the database inside out came up on my feed recently, and is still such an important work.
Going back even further, check out the original SQL paper, from 1974.
Not only do I love the clever title, but The End of the Bronze Age: Rethinking the Medallion Architecture is also a really good explanation of how "shift left" applies in the data world. If you prefer video there’s one of those too.
An interesting interview from a while back with Materialize’s CTO, Nikhil Benesch.
A useful look at the practicalities of Data Products, Data Contracts, and Change Data Capture.
I recently came across an interesting project from the European Union called Big Data Test Infrastructure. Despite the slightyly old-fashioned name, they’re doing some cool stuff with data and public services, such as this one looking at tree health in a town in Germany.
DataDog have their own proprietary event storage system called Husky. They’ve previously shared details of the ingestion process, and have recently posted how data compaction at scale is handled.
Two Apache projects were recently announced as graduating to top-level projects, including Apache StreamPark.
Excellent analysis from Jack Vanlightly looking at Why Snowflake wants streaming (specifically, Redpanda, about whom acquisition rumours are swirling).
A new Kafka TUI called kplay, and GUI called KafkIO.
What better way to learn the low-level details of Kafka than writing your own broker.
Confluent recently launched a VSCode plugin which now supports Kafka clusters too (not just Confluent Cloud).
A fantastic deep-dive blog on Kafka transactions.
A nicely explained and illustrated guide to windowing in Kafka Streams.
Mickael Maison has now been writing the Kafka Monthly Digest for an impressive seven years!
Hyprstream is a built on Apache Arrow Flight and DuckDB for "real-time data ingestion, windowed aggregation, caching, and serving". Read the associated paper here.
Your Database Skills Are Not 'Good to Have'
Uber run 2,300 MySQL clusters— this post has details of how they do it.

📧 Want to receive this monthly round-up as an email? Subscribe to my Substack where I cross-post the same content
If you like these kind of links you might like to read about How I Try To Keep Up With The Data Tech World (A List of Data Blogs)