Feb. 14, 2024, 5:42 a.m. | Harley Wiltzer Jesse Farebrother Arthur Gretton Yunhao Tang Andr\'e Barreto Will Dabney Marc G. Bellem

cs.LG updates on arXiv.org arxiv.org

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm …

consequences cs.ai cs.lg paper policy process reinforcement reinforcement learning representation stat.ml transition

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US