April 21, 2022, 5:01 p.m. | /u/cgnorthcutt

Machine Learning www.reddit.com

Hi folks. This morning I released the new [cleanlab 2.0](https://github.com/cleanlab/cleanlab) Python package for automatically finding errors in datasets and machine learning/analytics with real-world, messy data and labels.
tl;dr - cleanlab provides a framework to streamline data-centric AI.


https://preview.redd.it/hq1kyasvwwu81.png?width=2279&format=png&auto=webp&s=4fa3c82ec66d685c8fc4f95c5d9a0fc4be192d6b

After [1.0 launch](https://www.reddit.com/r/MachineLearning/comments/e03m49/p_cleanlab_accelerating_ml_and_deep_learning/) last year, engineers used cleanlab [at Google to](https://cleanlab.ai/blog/cleanlab-history/#2019) clean and train robust models on speech data), at [Amazon to estimate](https://cleanlab.ai/blog/cleanlab-history/#2017) how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, …

datasets errors machinelearning ml

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Alternant Data Engineering

@ Aspire Software | Angers, FR

Senior Software Engineer, Generative AI

@ Google | Dublin, Ireland