all AI news
Statistical Efficiency of Distributional Temporal Difference
March 12, 2024, 4:42 a.m. | Yang Peng, Liangyu Zhang, Zhihua Zhang
cs.LG updates on arXiv.org arxiv.org
Abstract: Distributional reinforcement learning (DRL), which cares about the full distribution of returns instead of just the mean, has achieved empirical success in various domains. One of the core tasks in the field of DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$. A distributional temporal difference (TD) algorithm has been accordingly proposed, which is an extension of the temporal difference algorithm in the classic RL literature. In …
abstract arxiv core cs.lg difference distribution domains efficiency evaluation mean policy reinforcement reinforcement learning returns statistical stat.ml success tasks temporal type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Software Engineer, Data Tools - Full Stack
@ DoorDash | Pune, India
Senior Data Analyst
@ Artsy | New York City