all AI news
Detecting Label Errors using Pre-Trained Language Models. (arXiv:2205.12702v1 [cs.CL])
May 26, 2022, 1:12 a.m. | Derek Chong, Jenny Hong, Christopher D. Manning
cs.CL updates on arXiv.org arxiv.org
We show that large pre-trained language models are extremely capable of
identifying label errors in datasets: simply verifying data points in
descending order of out-of-distribution loss significantly outperforms more
complex mechanisms for detecting label errors on natural language datasets. We
contribute a novel method to produce highly realistic, human-originated label
noise from crowdsourced data, and demonstrate the effectiveness of this method
on TweetNLP, providing an otherwise difficult to obtain measure of realistic
recall.
More from arxiv.org / cs.CL updates on arXiv.org
VAL: Interactive Task Learning with GPT Dialog Parsing
1 day, 6 hours ago |
arxiv.org
DBCopilot: Scaling Natural Language Querying to Massive Databases
1 day, 6 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Alternant Data Engineering
@ Aspire Software | Angers, FR
Senior Software Engineer, Generative AI
@ Google | Dublin, Ireland