Jan. 7, 2022, 5:56 a.m. | Kevin Kho

Towards Data Science - Medium towardsdatascience.com

Run PyCaret functions on each partition of data distributedly

Photo by Hannes Egler on Unsplash

PyCaret is a low code machine learning framework that automates a lot of parts of the machine learning pipeline. With just a few lines of code, several models can be trained on a dataset. In this post, we explore how to scale this capability by running several PyCaret training jobs in a distributed manner on Spark or Dask.

PyCaret Model Score Grid Example

First, we …

dask data science fugue machine learning pycaret scaling spark

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Alternant Data Engineering

@ Aspire Software | Angers, FR

Senior Software Engineer, Generative AI

@ Google | Dublin, Ireland