Nov. 18, 2023, 5:04 a.m. | Matt Collins

Towards Data Science - Medium towardsdatascience.com

Parallelising Python on Spark: Options for Concurrency with Pandas

Leverage the benefits of Spark when working with Pandas

Photo by Florian Steciuk on Unsplash

In my previous role, I spent some time working on an internal project to predict future disk storage space usage for our Managed Services customers across thousands of disks. Each disk is subject to its own usage patterns and this means we need a separate machine learning model for each disk which takes historical data to …

databricks machine learning parallel-computing python spark

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote