Nov. 8, 2023, 1:56 p.m. | /u/mrocklin

Data Science www.reddit.com

I gave this talk at PyData NYC last week. It was fun working with devs from various projects (Dask, Arrow, Polars, Spark) in the week leading up to the event. Thought I'd share a re-recording of it here

[https://youtu.be/wKH0-zs2g\_U](https://youtu.be/wKH0-zs2g_U)

This is the result of a couple weeks of work comparing large data frameworks on benchmarks ranging in size 10GB to 10TB. No project wins. It's really interesting analyzing results though.

DuckDB and Dask are the only projects that reliably finish …

arrow benchmarks dask data datascience devs duckdb event frameworks fun nyc projects recording scale spark talk thought work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US