Web: http://arxiv.org/abs/2108.05338

May 12, 2022, 1:11 a.m. | Shangtong Zhang, Shimon Whiteson

cs.LG updates on arXiv.org arxiv.org

Emphatic Temporal Difference (TD) methods are a class of off-policy
Reinforcement Learning (RL) methods involving the use of followon traces.
Despite the theoretical success of emphatic TD methods in addressing the
notorious deadly triad of off-policy RL, there are still two open problems.
First, followon traces typically suffer from large variance, making them hard
to use in practice. Second, though Yu (2015) confirms the asymptotic
convergence of some emphatic TD methods for prediction problems, there is still
no finite sample …

arxiv difference prediction

More from arxiv.org / cs.LG updates on arXiv.org

Predictive Ecology Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL