Feb. 7, 2024, 1:20 p.m. | /u/mrocklin

Data Science www.reddit.com

I really liked the simplicity of the [One Billion Row Challenge (1BRC)](https://github.com/gunnarmorling/1brc) that took off last month. It was fun to see lots of people apply different tools to the same simple-yet-clear problem “How do you parse, process, and aggregate a large CSV file as quickly as possible?”

For fun, my colleagues and I made a One Trillion Row Challenge (1TRC) dataset 🙂. Data lives on S3 in Parquet format (CSV made zero sense here) in a public bucket at …

challenge colleagues csv data datascience dataset datasets format fun parquet public sense

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne