Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale | allainews.com

Nov. 8, 2023, 1:56 p.m. | /u/mrocklin

Data Science www.reddit.com

I gave this talk at PyData NYC last week. It was fun working with devs from various projects (Dask, Arrow, Polars, Spark) in the week leading up to the event. Thought I'd share a re-recording of it here

[https://youtu.be/wKH0-zs2g\_U](https://youtu.be/wKH0-zs2g_U)

This is the result of a couple weeks of work comparing large data frameworks on benchmarks ranging in size 10GB to 10TB. No project wins. It's really interesting analyzing results though.

DuckDB and Dask are the only projects that reliably finish …

arrow benchmarks dask data datascience devs duckdb event frameworks fun nyc projects recording scale spark talk thought work

More from www.reddit.com / Data Science

Is there a tutorial to create your own PyTorch Module (Linear), Loss (Least Squares), and … 8 hours ago | www.reddit.com

academic create datascience easy +8

Took a couple years off to travel and do personal projects, while contracting for about … 23 hours ago | www.reddit.com

contracting data datascience data scientist +12

Do I need to know How to write algorithms from scratch if I want to … 1 day, 3 hours ago | www.reddit.com

algorithms code data datascience +5

Questions to ask and what to look for when interviewing to gauge the "technical culture" … 1 day, 8 hours ago | www.reddit.com

analyst culture datascience employees +14

Do you have both a ML engineer and a MLOps engineer on your team? If … 1 day, 11 hours ago | www.reddit.com

datascience difference engineer engineering +10

Have Data Scientist Interviews Evolved Over the Last Year? 1 day, 15 hours ago | www.reddit.com

access become change companies +17

Tell me about older individual contributors 1 day, 20 hours ago | www.reddit.com

cap contributors data datascience +6

Pedro Thermo Similarity vs Levenshtain/ OSA/ Jaro/ .. 1 day, 21 hours ago | www.reddit.com

algorithm algorithms alternative datascience +4

Struggling on where to plug Python into my workflow 1 day, 22 hours ago | www.reddit.com

business database datascience excel +18

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net