April 2, 2024, 7:41 p.m. | Yilei Chen, Aldo Pacchiano, Ioannis Ch. Paschalidis

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.00195v1 Announce Type: new
Abstract: In this work, we focus on the multiple-policy evaluation problem where we are given a set of $K$ target policies and the goal is to evaluate their performance (the expected total rewards) to an accuracy $\epsilon$ with probability at least $1-\delta$. We propose an algorithm named $\mathrm{CAESAR}$ to address this problem. Our approach is based on computing an approximate optimal offline sampling distribution and using the data sampled from it to perform the simultaneous estimation …

abstract accuracy algorithm arxiv cs.ai cs.lg delta epsilon evaluation focus least multiple performance policies policy probability set total type via work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne