Feb. 13, 2024, 5:45 a.m. | Hengrui Zhang Jiani Zhang Balasubramaniam Srinivasan Zhengyuan Shen Xiao Qin Christos Faloutsos Huzefa

cs.LG updates on arXiv.org arxiv.org

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed Tabsyn include (1) Generality: the ability to handle a broad spectrum of data types by …

advances blend cs.lg data data quality diffusion diffusion models methodology mixed paper quality space synthesis synthetic synthetic data tabular tabular data type types

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Machine Learning Engineer

@ Samsara | Canada - Remote