all AI news
SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets. (arXiv:2111.06467v2 [cs.CL] UPDATED)
Jan. 14, 2022, 2:11 a.m. | Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann
cs.LG updates on arXiv.org arxiv.org
NLP researchers need more, higher-quality text datasets. Human-labeled
datasets are expensive to collect, while datasets collected via automatic
retrieval from the web such as WikiBio are noisy and can include undesired
biases. Moreover, data sourced from the web is often included in datasets used
to pretrain models, leading to inadvertent cross-contamination of training and
test sets. In this work we introduce a novel method for efficient dataset
curation: we use a large language model to provide seed generations to human …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Marketing Data Analyst
@ Amazon.com | Amsterdam, North Holland, NLD
Senior Data Analyst
@ MoneyLion | Kuala Lumpur, Kuala Lumpur, Malaysia
Data Management Specialist - Office of the CDO - Chase- Associate
@ JPMorgan Chase & Co. | LONDON, LONDON, United Kingdom
BI Data Analyst
@ Nedbank | Johannesburg, ZA
Head of Data Science and Artificial Intelligence (m/f/d)
@ Project A Ventures | Munich, Germany
Senior Data Scientist - GenAI
@ Roche | Hyderabad RSS