March 12, 2024, 4:52 a.m. | Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

cs.CL updates on arXiv.org arxiv.org

arXiv:2307.08153v4 Announce Type: replace
Abstract: Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, to our knowledge, large-scale analysis has yet to be performed on how quality management is conducted when creating natural language …

abstract annotation annotations art arxiv biases cs.cl data data quality dataset datasets evaluation however machine machine learning machine learning models management popular quality state state-of-the-art models train training trustworthy type unbiased

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571