Digging into Ducklake
After a week’s holiday ("vacation", for y’all in the US) without a glance at anything work-related, what joy to return and find that the …
Building a data pipeline with DuckDB
In this blog post I’m going to explore how as a data engineer in the field today I might go about putting together a rudimentary data pipeline. I’ll …
Exporting Notebooks from DuckDB UI
DuckDB added a very cool UI last week and I’ve been using it as my primary interface to DuckDB since. One thing that bothered me was that the SQL I …
Kicking the tyres on the new DuckDB UI
I wrote a couple of weeks ago about using DuckDB and Rill Data to explore a new data source that I’m working with. I wanted to understand the data’s …
Exploring UK Environment Agency data in DuckDB and Rill
The UK Environment Agency publishes a feed of data relating to rainfall and river levels. As a prelude to building a streaming pipeline with this …
DuckDB tricks - renaming fields in a SELECT * across tables
I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted. …
1️⃣🐝🏎️🦆 (1BRC in SQL with DuckDB)
Why should the Java folk have all the fun?! My friend and colleague Gunnar Morling launched a fun challenge this week: how fast can you aggregate and …
Quickly Convert CSV to Parquet with DuckDB
Here’s a neat little trick you can use with DuckDB to convert a CSV file into a Parquet file: COPY (SELECT * FROM …
Aligning mismatched Parquet schemas in DuckDB
What do you do when you want to query over multiple parquet files but the schemas don’t quite line up? Let’s find out 👇🏻
Data Engineering in 2022: Wrangling the feedback data from Current 22 with dbt
I started my dbt journey by poking and pulling at the pre-built jaffle_shop demo running with DuckDB as its data store. Now I want to see if I can …
Data Engineering in 2022: Exploring dbt with DuckDB
I’ve been wanting to try out dbt for some time now, and a recent long-haul flight seemed like the obvious opportunity to do so. Except many of the …
Current 22 - Session Analysis with DuckDB and Jupyter Notebook
At Current 2022 the audience was given the option to submit ratings. Here’s some analysis I’ve done on the raw data. It’s …