Feb. 7, 2024, 5:42 a.m. | Ossi R\"ais\"a Antti Honkela

cs.LG updates on arXiv.org arxiv.org

Recent studies have highlighted the benefits of generating multiple synthetic datasets for supervised learning, from increased accuracy to more effective model selection and uncertainty estimation. These benefits have clear empirical support, but the theoretical understanding of them is currently very light. We seek to increase the theoretical understanding by deriving bias-variance decompositions for several settings of using multiple synthetic datasets. Our theory predicts multiple synthetic datasets to be especially beneficial for high-variance downstream predictors, and yields a simple rule of …

accuracy benefits bias bias-variance clear cs.lg datasets light model selection multiple stat.ml studies supervised learning support synthetic them uncertainty understanding variance

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada