Are LLMs Naturally Good at Synthetic Tabular Data Generation? | allainews.com

June 21, 2024, 4:48 a.m. | Shengzhe Xu, Cho-Ting Lee, Mandar Sharma, Raquib Bin Yousuf, Nikhil Muralidhar, Naren Ramakrishnan

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.14541v1 Announce Type: new
Abstract: Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Due to the autoregressive nature of LLMs, fine-tuning with random order permutation runs counter to the importance of modeling functional …

abstract applications arxiv business cs.lg data data generation good however images language language models large language large language models llms paper potential scientific synthetic tabular tabular data text type

More from arxiv.org / cs.LG updates on arXiv.org

Scientific Machine Learning Based Reduced-Order Models for Plasma Turbulence Simulations 13 hours ago | arxiv.org

abstract arxiv build construction +20

LEDITS++: Limitless Image Editing using Text-to-Image Models 13 hours ago | arxiv.org

abstract aim apply arxiv +22

InterVLS: Interactive Model Understanding and Improvement with Vision-Language Surrogates 13 hours ago | arxiv.org

abstract applications arxiv challenges +22

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor 13 hours ago | arxiv.org

abstract arxiv challenges cs.ai +16

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: Task Formulations and Machine Learning Methods 13 hours ago | arxiv.org

abstract applications arxiv attention +19

Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter 13 hours ago | arxiv.org

abstract anomaly anomaly detection arxiv +20

Gradient Coding with Iterative Block Leverage Score Sampling 13 hours ago | arxiv.org

abstract arxiv block coding +17

Contextual Dynamic Pricing with Strategic Buyers 13 hours ago | arxiv.org

abstract arxiv behavior consumer +18

On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems 13 hours ago | arxiv.org

abstract agent arxiv context +19

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

View on ai-jobs.net

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

View on ai-jobs.net

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

View on ai-jobs.net

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA

View on ai-jobs.net