May 23, 2024, 11:22 a.m. | /u/mrocklin

Data Science www.reddit.com

I hit publish on a blogpost last week on running Spark, Dask, DuckDB, and Polars on the TPC-H benchmark across a variety of scales (10 GiB, 100 GiB, 1 TiB, 10 TiB), both locally on a Macbook Pro and on the cloud.  It’s a broad set of configurations.  The results are interesting.

No project wins uniformly.  They all perform differently at different scales: 

* DuckDB and Polars are crazy fast on local machines
* Dask and DuckDB seem to win …

benchmark benchmarks cloud dask datascience duckdb macbook results running set spark

Senior Data Engineer

@ Displate | Warsaw

Decision Scientist

@ Tesco Bengaluru | Bengaluru, India

Senior Technical Marketing Engineer (AI/ML-powered Cloud Security)

@ Palo Alto Networks | Santa Clara, CA, United States

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

Product Manager, Generative AI

@ Adobe | San Jose

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague