Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models. (arXiv:2209.10064v1 [stat.ML]) | allainews.com

Sept. 22, 2022, 1:13 a.m. | Rui Miao, Zhengling Qi, Xiaoke Zhang

stat.ML updates on arXiv.org arxiv.org

We study the problem of off-policy evaluation (OPE) for episodic Partially
Observable Markov Decision Processes (POMDPs) with continuous states. Motivated
by the recently proposed proximal causal inference framework, we develop a
non-parametric identification result for estimating the policy value via a
sequence of so-called V-bridge functions with the help of time-dependent proxy
variables. We then develop a fitted-Q-evaluation-type algorithm to estimate
V-bridge functions recursively, where a non-parametric instrumental variable
(NPIV) problem is solved at each step. By analyzing this challenging …

arxiv decision evaluation markov non-parametric observable parametric policy processes

More from arxiv.org / stat.ML updates on arXiv.org

Simultaneous upper and lower bounds of American option prices with hedging via neural networks 9 hours ago | arxiv.org

abstract arxiv form math.pr +11

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF 1 day, 10 hours ago | arxiv.org

accounting arxiv context cs.ai +6

Hacking Task Confounder in Meta-Learning 1 day, 10 hours ago | arxiv.org

abstract arxiv cs.lg hacking +12

Reflection coupling for unadjusted generalized Hamiltonian Monte Carlo in the nonconvex stochastic gradient case 1 day, 10 hours ago | arxiv.org

abstract algorithms arxiv case +10

Provable Reward-Agnostic Preference-Based Reinforcement Learning 1 day, 10 hours ago | arxiv.org

abstract agent arxiv cs.ai +16

Mastering Diverse Domains through World Models 1 day, 10 hours ago | arxiv.org

abstract algorithm algorithms application +22

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models 1 day, 10 hours ago | arxiv.org

abstract arxiv cs.it cs.lg +14

Additive Covariance Matrix Models: Modelling Regional Electricity Net-Demand in Great Britain 1 day, 10 hours ago | arxiv.org

abstract arxiv britain consumption +18

Learning Algorithm Generalization Error Bounds via Auxiliary Distributions 1 day, 10 hours ago | arxiv.org

abstract algorithm arxiv cs.it +16

Senior Marketing Data Analyst

@ Amazon.com | Amsterdam, North Holland, NLD

View on ai-jobs.net

Senior Data Analyst

@ MoneyLion | Kuala Lumpur, Kuala Lumpur, Malaysia

View on ai-jobs.net

Data Management Specialist - Office of the CDO - Chase- Associate

@ JPMorgan Chase & Co. | LONDON, LONDON, United Kingdom

View on ai-jobs.net

BI Data Analyst

@ Nedbank | Johannesburg, ZA

View on ai-jobs.net

Head of Data Science and Artificial Intelligence (m/f/d)

@ Project A Ventures | Munich, Germany

View on ai-jobs.net

Senior Data Scientist - GenAI

@ Roche | Hyderabad RSS

View on ai-jobs.net