Digging into Ducklake

After a week’s holiday ("vacation", for y’all in the US) without a glance at anything work-related, what joy to return and find that the …

Building a data pipeline with DuckDB

In this blog post I’m going to explore how as a data engineer in the field today I might go about putting together a rudimentary data pipeline. I’ll …

Exporting Notebooks from DuckDB UI

DuckDB added a very cool UI last week and I’ve been using it as my primary interface to DuckDB since. One thing that bothered me was that the SQL I …

Kicking the tyres on the new DuckDB UI

I wrote a couple of weeks ago about using DuckDB and Rill Data to explore a new data source that I’m working with. I wanted to understand the data’s …

Exploring UK Environment Agency data in DuckDB and Rill

The UK Environment Agency publishes a feed of data relating to rainfall and river levels. As a prelude to building a streaming pipeline with this …

DuckDB tricks - renaming fields in a SELECT * across tables

I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted. …

1️⃣🐝🏎️🦆 (1BRC in SQL with DuckDB)

Why should the Java folk have all the fun?! My friend and colleague Gunnar Morling launched a fun challenge this week: how fast can you aggregate and …

Quickly Convert CSV to Parquet with DuckDB

Here’s a neat little trick you can use with DuckDB to convert a CSV file into a Parquet file: COPY (SELECT * FROM …

Aligning mismatched Parquet schemas in DuckDB

What do you do when you want to query over multiple parquet files but the schemas don’t quite line up? Let’s find out 👇🏻

Data Engineering in 2022: Wrangling the feedback data from Current 22 with dbt

I started my dbt journey by poking and pulling at the pre-built jaffle_shop demo running with DuckDB as its data store. Now I want to see if I can …

Data Engineering in 2022: Exploring dbt with DuckDB

I’ve been wanting to try out dbt for some time now, and a recent long-haul flight seemed like the obvious opportunity to do so. Except many of the …

Current 22 - Session Analysis with DuckDB and Jupyter Notebook

At Current 2022 the audience was given the option to submit ratings. Here’s some analysis I’ve done on the raw data. It’s …