March 21, 2024, 4:43 a.m. | Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao

cs.LG updates on arXiv.org arxiv.org

arXiv:2310.10402v2 Announce Type: replace
Abstract: Synthetic training data has gained prominence in numerous learning tasks and scenarios, offering advantages such as dataset augmentation, generalization evaluation, and privacy preservation. Despite these benefits, the efficiency of synthetic data generated by current methodologies remains inferior when training advanced deep models exclusively, limiting its practical utility. To address this challenge, we analyze the principles underlying training data synthesis for supervised learning and elucidate a principled theoretical framework from the distribution-matching perspective that explicates the …

abstract advanced advantages arxiv augmentation benefits cs.ai cs.lg current data dataset distribution efficiency evaluation fake generated practical preservation privacy synthesis synthetic synthetic data tasks through training training data type utility

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States