rmoff's random ramblings
about talks

Quickly Convert CSV to Parquet with DuckDB

Published Mar 14, 2023 by in DuckDB at https://rmoff.net/2023/03/14/quickly-convert-csv-to-parquet-with-duckdb/

Here’s a neat little trick you can use with DuckDB to convert a CSV file into a Parquet file:

COPY (SELECT *
	    FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
  TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');

You can modify the schema too if you want, selecting specific fields and renaming them too if you want:

COPY (SELECT col1, col2, col3 AS foo
	    FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
  TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');

Read more on the DuckDB CSV and Parquet docs pages.


Robin Moffatt

Robin Moffatt is a Principal DevEx Engineer at LakeFS. He likes writing about himself in the third person, eating good breakfasts, and drinking good beer.

Story logo

© 2023