Nov. 5, 2023, 6:41 a.m. | Naiqing Guan, Kaiwen Chen, Nick Koudas

cs.LG updates on arXiv.org arxiv.org

Programmatic weak supervision methodologies facilitate the expedited labeling
of extensive datasets through the use of label functions (LFs) that encapsulate
heuristic data sources. Nonetheless, the creation of precise LFs necessitates
domain expertise and substantial endeavors. Recent advances in pre-trained
language models (PLMs) have exhibited substantial potential across diverse
tasks. However, the capacity of PLMs to autonomously formulate accurate LFs
remains an underexplored domain. In this research, we address this gap by
introducing DataSculpt, an interactive framework that harnesses PLMs for …

advances arxiv data datasets data sources design diverse domain expertise functions labeling language language models large language large language models programmatic supervision tasks through

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne