Aug. 5, 2022, 1:25 p.m. | /u/MGeeeeeezy

Data Science www.reddit.com

What do you use PySpark for and what are the advantages over a Pandas df?

If I want to run operations concurrently in Pandas I typically just use joblib with sharedmem and get a great boost.

datascience pyspark

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States