Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale | allainews.com

Nov. 8, 2023, 1:56 p.m. | /u/mrocklin

Data Science www.reddit.com

I gave this talk at PyData NYC last week. It was fun working with devs from various projects (Dask, Arrow, Polars, Spark) in the week leading up to the event. Thought I'd share a re-recording of it here

[https://youtu.be/wKH0-zs2g\_U](https://youtu.be/wKH0-zs2g_U)

This is the result of a couple weeks of work comparing large data frameworks on benchmarks ranging in size 10GB to 10TB. No project wins. It's really interesting analyzing results though.

DuckDB and Dask are the only projects that reliably finish …

arrow benchmarks dask data datascience devs duckdb event frameworks fun nyc projects recording scale spark talk thought work

More from www.reddit.com / Data Science

What are you excited about based on the career you've built so far and where … 8 hours ago | www.reddit.com

career datascience fun knowledge +2

If you are a data scientist and does not work on Machine Learning part, then … 8 hours ago | www.reddit.com

analyst data data analyst datascience +6

What do you think of graduate student applicants? 1 day ago | www.reddit.com

data data science datascience graduate +9

Anyone have experience working in a healthcare start-up? 1 day, 4 hours ago | www.reddit.com

companies datascience experience healthcare +10

Survival Analysis Question (For Attrition Prediction) 1 day, 7 hours ago | www.reddit.com

analysis analyst attrition concordance +10

How to transition to machine learning engineering? 1 day, 9 hours ago | www.reddit.com

automl cadence consulting data +22

Classification - using both euclidean distance and cosine similarity for inference 1 day, 13 hours ago | www.reddit.com

classification context cosine data +5

Offer from an org that is mostly operating in excel 1 day, 19 hours ago | www.reddit.com

analyst data data analyst datascience +8

What are some good resources to learn about missing values and different approaches to deal … 2 days, 8 hours ago | www.reddit.com

data datascience deal good +12

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore

View on ai-jobs.net