May 16, 2023, 5:56 p.m. | /u/jonas__m

Machine Learning www.reddit.com

Hello Redditors!

I'm excited to share **Datalab** — a *linter* for datasets.

​

[These real-world issues are automatically found by Datalab.](https://preview.redd.it/lqqo84vn380b1.png?width=637&format=png&auto=webp&v=enabled&s=4cf7388d7571fd40f326fafb121974d22090a319)

I recently published a [blog](https://cleanlab.ai/blog/datalab/) introducing **Datalab** and an [open-source](https://github.com/cleanlab/cleanlab) Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick [Jupyter tutorial](https://docs.cleanlab.ai/stable/tutorials/datalab/datalab_quickstart.html) to run **Datalab** on your own data.

All of us that have dealt with real-world data know it’s full of various issues like label errors, outliers, …

code data datasets drift errors etc line linter machinelearning near outliers software world

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US