all AI news
[P] Datalab: A Linter for ML Datasets
May 16, 2023, 5:56 p.m. | /u/jonas__m
Machine Learning www.reddit.com
I'm excited to share **Datalab** — a *linter* for datasets.
​
[These real-world issues are automatically found by Datalab.](https://preview.redd.it/lqqo84vn380b1.png?width=637&format=png&auto=webp&v=enabled&s=4cf7388d7571fd40f326fafb121974d22090a319)
I recently published a [blog](https://cleanlab.ai/blog/datalab/) introducing **Datalab** and an [open-source](https://github.com/cleanlab/cleanlab) Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick [Jupyter tutorial](https://docs.cleanlab.ai/stable/tutorials/datalab/datalab_quickstart.html) to run **Datalab** on your own data.
All of us that have dealt with real-world data know it’s full of various issues like label errors, outliers, …
code data datasets drift errors etc line linter machinelearning near outliers software world
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US