Sept. 22, 2022, 1:13 a.m. | Rui Miao, Zhengling Qi, Xiaoke Zhang

stat.ML updates on arXiv.org arxiv.org

We study the problem of off-policy evaluation (OPE) for episodic Partially
Observable Markov Decision Processes (POMDPs) with continuous states. Motivated
by the recently proposed proximal causal inference framework, we develop a
non-parametric identification result for estimating the policy value via a
sequence of so-called V-bridge functions with the help of time-dependent proxy
variables. We then develop a fitted-Q-evaluation-type algorithm to estimate
V-bridge functions recursively, where a non-parametric instrumental variable
(NPIV) problem is solved at each step. By analyzing this challenging …

arxiv decision evaluation markov non-parametric observable parametric policy processes

Senior Marketing Data Analyst

@ Amazon.com | Amsterdam, North Holland, NLD

Senior Data Analyst

@ MoneyLion | Kuala Lumpur, Kuala Lumpur, Malaysia

Data Management Specialist - Office of the CDO - Chase- Associate

@ JPMorgan Chase & Co. | LONDON, LONDON, United Kingdom

BI Data Analyst

@ Nedbank | Johannesburg, ZA

Head of Data Science and Artificial Intelligence (m/f/d)

@ Project A Ventures | Munich, Germany

Senior Data Scientist - GenAI

@ Roche | Hyderabad RSS