Feb. 9, 2024, 5:42 a.m. | Yunhao Tang Mark Rowland R\'emi Munos Bernardo \'Avila Pires Will Dabney

cs.LG updates on arXiv.org arxiv.org

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($\lambda$) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q($\lambda$) and validate theoretical insights with tabular experiments. We show how distributional Q($\lambda$)-C51, a combination of Q($\lambda$) with the C51 agent, exhibits promising results on deep …

algorithms apply cs.lg evaluation family importance interactions lambda policy sampling

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Director of Machine Learning

@ Axelera AI | Hybrid/Remote - Europe (incl. UK)

Senior Data Scientist - Trendyol Milla

@ Trendyol | Istanbul (All)

Data Scientist, Mid

@ Booz Allen Hamilton | USA, CA, San Diego (1615 Murray Canyon Rd)

Systems Development Engineer , Amazon Robotics Business Applications and Solutions Engineering

@ Amazon.com | Boston, Massachusetts, USA