Web: http://arxiv.org/abs/2205.09328

Sept. 19, 2022, 1:12 a.m. | Zifeng Wang, Jimeng Sun

cs.LG updates on arXiv.org arxiv.org

Tabular data (or tables) are the most widely used data format in machine
learning (ML). However, ML models often assume the table structure keeps fixed
in training and testing. Before ML modeling, heavy data cleaning is required to
merge disparate tables with different columns. This preprocessing often incurs
significant data waste (e.g., removing unmatched columns and samples). How to
learn ML models from multiple tables with partially overlapping columns? How to
incrementally update ML models as more columns become available …

arxiv tables tabular transformers

