Web: http://arxiv.org/abs/2111.06784

June 17, 2022, 1:11 a.m. | Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

cs.LG updates on arXiv.org arxiv.org

We consider off-policy evaluation (OPE) in Partially Observable Markov
Decision Processes (POMDPs), where the evaluation policy depends only on
observable variables and the behavior policy depends on unobservable latent
variables. Existing works either assume no unmeasured confounders, or focus on
settings where both the observation and the state spaces are tabular. In this
work, we first propose novel identification methods for OPE in POMDPs with
latent confounders, by introducing bridge functions that link the target
policy's value and the observed …

arxiv decision evaluation learning lg markov minimax observable policy processes

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY