March 12, 2024, 4:42 a.m. | Yang Peng, Liangyu Zhang, Zhihua Zhang

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.05811v1 Announce Type: cross
Abstract: Distributional reinforcement learning (DRL), which cares about the full distribution of returns instead of just the mean, has achieved empirical success in various domains. One of the core tasks in the field of DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$. A distributional temporal difference (TD) algorithm has been accordingly proposed, which is an extension of the temporal difference algorithm in the classic RL literature. In …

abstract arxiv core cs.lg difference distribution domains efficiency evaluation mean policy reinforcement reinforcement learning returns statistical stat.ml success tasks temporal type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City