April 3, 2024, 4:42 a.m. | Kyungbok Lee, Myunghee Cho Paik

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.01830v1 Announce Type: cross
Abstract: We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown. The proposed estimator initially estimates the logging policy and then estimates the value function model by minimizing the asymptotic variance of the estimator while considering the estimating effect of the logging policy. When the logging policy model is correctly specified, DRUnknown achieves the smallest asymptotic variance …

abstract arxiv cs.lg decision estimator evaluation function logging markov novel policy processes robust stat.ml type value

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne