Feb. 5, 2024, 6:44 a.m. | Dimitris Bertsimas Arthur Delarue Jean Pauphilet

cs.LG updates on arXiv.org arxiv.org

Missing data is a common issue in real-world datasets. This paper studies the performance of impute-then-regress pipelines by contrasting theoretical and empirical evidence. We establish the asymptotic consistency of such pipelines for a broad family of imputation methods. While common sense suggests that a `good' imputation method produces datasets that are plausible, we show, on the contrary, that, as far as prediction is concerned, crude can be good. Among others, we find that mode-impute is asymptotically sub-optimal, while mean-impute is …

common sense cs.lg data datasets evidence family good imputation issue paper performance pipelines prediction rules sense simple stat.ml studies world

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Intern Large Language Models Planning (f/m/x)

@ BMW Group | Munich, DE

Data Engineer Analytics

@ Meta | Menlo Park, CA | Remote, US