all AI news
Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models. (arXiv:2209.10064v1 [stat.ML])
stat.ML updates on arXiv.org arxiv.org
We study the problem of off-policy evaluation (OPE) for episodic Partially
Observable Markov Decision Processes (POMDPs) with continuous states. Motivated
by the recently proposed proximal causal inference framework, we develop a
non-parametric identification result for estimating the policy value via a
sequence of so-called V-bridge functions with the help of time-dependent proxy
variables. We then develop a fitted-Q-evaluation-type algorithm to estimate
V-bridge functions recursively, where a non-parametric instrumental variable
(NPIV) problem is solved at each step. By analyzing this challenging …
arxiv decision evaluation markov non-parametric observable parametric policy processes