Feb. 1, 2024, 12:41 p.m. | Tianqing Fang Wenxuan Zhou Fangyu Liu Hongming Zhang Yangqiu Song Muhao Chen

cs.CL updates on arXiv.org arxiv.org

Data Augmentation (DA) is frequently used to provide additional training data without extra human annotation automatically. However, data augmentation may introduce noisy data that impairs training. To guarantee the quality of augmented data, existing methods either assume no noise exists in the augmented data and adopt consistency training or use simple heuristics such as training loss and diversity constraints to filter out "noisy" data. However, those filtered examples may still contain useful information, and dropping them completely causes a loss …

annotation augmentation augmented data cs.ai cs.cl data denoising extra fly human language language understanding natural natural language noise quality simple training training data understanding

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Scientist (Database Development)

@ Nasdaq | Bengaluru-Affluence