all AI news
Analyzing Dataset Annotation Quality Management in the Wild
March 12, 2024, 4:52 a.m. | Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych
cs.CL updates on arXiv.org arxiv.org
Abstract: Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, to our knowledge, large-scale analysis has yet to be performed on how quality management is conducted when creating natural language …
abstract annotation annotations art arxiv biases cs.cl data data quality dataset datasets evaluation however machine machine learning machine learning models management popular quality state state-of-the-art models train training trustworthy type unbiased
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Robotics Technician - 3rd Shift
@ GXO Logistics | Perris, CA, US, 92571