all AI news
Understanding the Dataset Practitioners Behind Large Language Model Development
Feb. 27, 2024, 5:50 a.m. | Crystal Qian, Emily Reif, Minsuk Kahng
cs.CL updates on arXiv.org arxiv.org
Abstract: As large language models (LLMs) become more advanced and impactful, it is increasingly important to scrutinize the data that they rely upon and produce. What is it to be a dataset practitioner doing this work? We approach this in two parts: first, we define the role of "dataset practitioner" by performing a retrospective analysis on the responsibilities of teams contributing to LLM development at Google. Then, we conduct semi-structured interviews with a cross-section of these …
abstract advanced arxiv become cs.ai cs.cl cs.hc data dataset development language language model language models large language large language model large language models llms model development role type understanding work
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 2 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 2 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Data Engineering Manager
@ Microsoft | Redmond, Washington, United States
Machine Learning Engineer
@ Apple | San Diego, California, United States