Feb. 27, 2024, 5:45 a.m. | Laura Battaglia, Timothy Christensen, Stephen Hansen, Szymon Sacher

stat.ML updates on arXiv.org arxiv.org

arXiv:2402.15585v1 Announce Type: cross
Abstract: The leading strategy for analyzing unstructured data uses two steps. First, latent variables of economic interest are estimated with an upstream information retrieval model. Second, the estimates are treated as "data" in a downstream econometric model. We establish theoretical arguments for why this two-step strategy leads to biased inference in empirically plausible settings. More constructively, we propose a one-step strategy for valid inference that uses the upstream and downstream models jointly. The one-step strategy (i) …

abstract arxiv data econ.em economic generated inference information leads regression retrieval stat.ml strategy type unstructured unstructured data variables

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571