Feb. 13, 2024, 5:42 a.m. | Kaiwen Wang Owen Oertell Alekh Agarwal Nathan Kallus Wen Sun

cs.LG updates on arXiv.org arxiv.org

In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the return distribution, can obtain second-order bounds in both online and offline RL in general settings with function approximation. Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributional RL. To the best of our knowledge, our results are the first second-order bounds for low-rank MDPs and for offline RL. When specializing to …

approximation benefits cs.lg distribution function general instance offline paper prove reinforcement reinforcement learning scale variance

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Data Engineering Director-Big Data technologies (Hadoop, Spark, Hive, Kafka)

@ Visa | Bengaluru, India

Senior Data Engineer

@ Manulife | Makati City, Manulife Philippines Head Office

GDS Consulting Senior Data Scientist 2

@ EY | Taguig, PH, 1634

IT Data Analyst Team Lead

@ Rosecrance | Rockford, Illinois, United States