all AI news
Truncated Emphatic Temporal Difference Methods for Prediction and Control. (arXiv:2108.05338v2 [cs.LG] UPDATED)
Web: http://arxiv.org/abs/2108.05338
May 12, 2022, 1:11 a.m. | Shangtong Zhang, Shimon Whiteson
cs.LG updates on arXiv.org arxiv.org
Emphatic Temporal Difference (TD) methods are a class of off-policy
Reinforcement Learning (RL) methods involving the use of followon traces.
Despite the theoretical success of emphatic TD methods in addressing the
notorious deadly triad of off-policy RL, there are still two open problems.
First, followon traces typically suffer from large variance, making them hard
to use in practice. Second, though Yu (2015) confirms the asymptotic
convergence of some emphatic TD methods for prediction problems, there is still
no finite sample …
More from arxiv.org / cs.LG updates on arXiv.org
Latest AI/ML/Big Data Jobs
Predictive Ecology Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Data Analyst, Patagonia Action Works
@ Patagonia | Remote
Data & Insights Strategy & Innovation General Manager
@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX
Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis
@ Ahmedabad University | Ahmedabad, India
Director, Applied Mathematics & Computational Research Division
@ Lawrence Berkeley National Lab | Berkeley, Ca
Business Data Analyst
@ MainStreet Family Care | Birmingham, AL