Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation | allainews.com

April 16, 2024, 4:51 a.m. | Juhwan Choi, Jungmin Yun, Kyohoon Jin, YoungBin Kim

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.09682v1 Announce Type: new
Abstract: The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models. However, datasets often contain noisy data inadvertently included during the construction process. Numerous attempts have been made to correct this issue through human annotators. However, hiring and managing human annotators is expensive and time-consuming. As an alternative, recent studies are exploring the use of large language models (LLMs) for data annotation.
In this study, we present a case …

abstract annotation arxiv construction cost cs.ai cs.cl data data annotation dataset datasets hiring however human issue llm performance process quality reliability through type via

More from arxiv.org / cs.CL updates on arXiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation 15 hours ago | arxiv.org

abstract arxiv asr audio +22

Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment 15 hours ago | arxiv.org

abstract accuracy arxiv continuous +17

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria 15 hours ago | arxiv.org

arxiv cs.cl llms mllm +5

The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics 15 hours ago | arxiv.org

abstract arxiv challenges computational +18

HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation 15 hours ago | arxiv.org

abstract apis arxiv costs +22

Prompt have evil twins 15 hours ago | arxiv.org

abstract arxiv behavior call +9

Reconstructing Materials Tetrahedron: Challenges in Materials Information Extraction 15 hours ago | arxiv.org

abstract arxiv challenges cond-mat.mtrl-sci +16

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition 15 hours ago | arxiv.org

abstract arxiv asr attention +19

An Interactive Framework for Profiling News Media Sources 15 hours ago | arxiv.org

abstract arxiv cs.cl fake +10

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Developer AI Senior Staff Engineer, Machine Learning

@ Google | Sunnyvale, CA, USA; New York City, USA

View on ai-jobs.net

Engineer* Cloud & Data Operations (f/m/d)

@ SICK Sensor Intelligence | Waldkirch (bei Freiburg), DE, 79183

View on ai-jobs.net