March 12, 2024, 4:52 a.m. | Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

cs.CL updates on arXiv.org arxiv.org

arXiv:2307.08153v4 Announce Type: replace
Abstract: Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, to our knowledge, large-scale analysis has yet to be performed on how quality management is conducted when creating natural language …

abstract annotation annotations art arxiv biases cs.cl data data quality dataset datasets evaluation however machine machine learning machine learning models management popular quality state state-of-the-art models train training trustworthy type unbiased

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US