Oct. 6, 2022, 6:42 p.m. | Elías Snorrason

Towards Data Science - Medium towardsdatascience.com

Understanding Outliers in Text Data with Transformers, cleanlab, and Topic Modeling

An open-source python workflow to audit text datasets

Image by LubosHouska from Pixabay.

Many text corpora contain heterogeneous documents, some of which may be anomalous and worth understanding more. For deployed ML systems, in particular, we may want to automatically flag test documents that do not stem from the same distribution as their training data and understand emerging themes within these new documents that were absent from the …

data modeling nlp outlier-detection outliers text topic modeling transformers understanding

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training

@ Amazon.com | Cupertino, California, USA