SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets. (arXiv:2111.06467v2 [cs.CL] UPDATED) | allainews.com

Jan. 14, 2022, 2:11 a.m. | Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann

cs.LG updates on arXiv.org arxiv.org

NLP researchers need more, higher-quality text datasets. Human-labeled
datasets are expensive to collect, while datasets collected via automatic
retrieval from the web such as WikiBio are noisy and can include undesired
biases. Moreover, data sourced from the web is often included in datasets used
to pretrain models, leading to inadvertent cross-contamination of training and
test sets. In this work we introduce a novel method for efficient dataset
curation: we use a large language model to provide seed generations to human …

ai arxiv case study collaborative datasets human study text

More from arxiv.org / cs.LG updates on arXiv.org

Stochastic Optimal Control Matching 12 hours ago | arxiv.org

arxiv control cs.lg cs.na +6

Value Approximation for Two-Player General-Sum Differential Games with State Constraints 12 hours ago | arxiv.org

abstract approximation arxiv constraints +20

Can We Edit Multimodal Large Language Models? 12 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.cv +9

XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation 12 hours ago | arxiv.org

ai benchmark arxiv benchmark cs.cv +7

Generalized Schr\"odinger Bridge Matching 12 hours ago | arxiv.org

arxiv bridge cs.lg generalized +3

Tight bounds on Pauli channel learning without entanglement 12 hours ago | arxiv.org

abstract algorithms arxiv cs.it +9

Automated mapping of virtual environments with visual predictive coding 12 hours ago | arxiv.org

abstract access algorithms arxiv +28

Confident Feature Ranking 12 hours ago | arxiv.org

abstract arxiv cs.ai cs.lg +14

Integrated Sensing-Communication-Computation for Edge Artificial Intelligence 12 hours ago | arxiv.org

abstract advanced and edge ai artificial +27

Senior Marketing Data Analyst

@ Amazon.com | Amsterdam, North Holland, NLD

View on ai-jobs.net

Senior Data Analyst

@ MoneyLion | Kuala Lumpur, Kuala Lumpur, Malaysia

View on ai-jobs.net

Data Management Specialist - Office of the CDO - Chase- Associate

@ JPMorgan Chase & Co. | LONDON, LONDON, United Kingdom

View on ai-jobs.net

BI Data Analyst

@ Nedbank | Johannesburg, ZA

View on ai-jobs.net

Head of Data Science and Artificial Intelligence (m/f/d)

@ Project A Ventures | Munich, Germany

View on ai-jobs.net

Senior Data Scientist - GenAI

@ Roche | Hyderabad RSS

View on ai-jobs.net