March 13, 2024, 4:43 a.m. | Shirong Xu, Will Wei Sun, Guang Cheng

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.10015v2 Announce Type: replace-cross
Abstract: Synthetic data algorithms are widely employed in industries to generate artificial data for downstream learning tasks. While existing research primarily focuses on empirically evaluating utility of synthetic data, its theoretical understanding is largely lacking. This paper bridges the practice-theory gap by establishing relevant utility theory in a statistical learning framework. It considers two utility metrics: generalization and ranking of models trained on synthetic data. The former is defined as the generalization difference between models trained …

abstract algorithms artificial arxiv cs.lg data gap generate industries paper practice research statistical stat.ml synthetic synthetic data tasks theory type understanding utility

Senior Data Engineer

@ Displate | Warsaw

Decision Scientist

@ Tesco Bengaluru | Bengaluru, India

Senior Technical Marketing Engineer (AI/ML-powered Cloud Security)

@ Palo Alto Networks | Santa Clara, CA, United States

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

Product Manager, Generative AI

@ Adobe | San Jose

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague