all AI news
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes. (arXiv:2111.06784v4 [cs.LG] UPDATED)
Web: http://arxiv.org/abs/2111.06784
cs.LG updates on arXiv.org arxiv.org
We consider off-policy evaluation (OPE) in Partially Observable Markov
Decision Processes (POMDPs), where the evaluation policy depends only on
observable variables and the behavior policy depends on unobservable latent
variables. Existing works either assume no unmeasured confounders, or focus on
settings where both the observation and the state spaces are tabular. In this
work, we first propose novel identification methods for OPE in POMDPs with
latent confounders, by introducing bridge functions that link the target
policy's value and the observed …
arxiv decision evaluation learning lg markov minimax observable policy processes