Aug. 17, 2022, 2:19 p.m. | Michael Berk

Towards Data Science - Medium towardsdatascience.com

The default file format for any data science workflow

Have you ever used pd.read_csv() in pandas? Well, that command could have run ~50x faster if you had used parquet instead of CSV.

Photo by Mike Benna on Unsplash

In this post we will discuss apache parquet, an extremely efficient and well-supported file format. The post is geared towards data practitioners (ML, DE, DS) so we’ll be focusing on high-level concepts and using SQL to talk through core concepts, but links …

data data lake data science data warehouse editors pick format parquet

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne