all AI news
Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence
March 12, 2024, 4:41 a.m. | Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton
cs.LG updates on arXiv.org arxiv.org
Abstract: We show that deep reinforcement learning can maintain its ability to learn without resetting network parameters in settings where the number of gradient updates greatly exceeds the number of environment samples. Under such large update-to-data ratios, a recent study by Nikishin et al. (2022) suggested the emergence of a primacy bias, in which agents overfit early interactions and downplay later experience, impairing their ability to learn. In this work, we dissect the phenomena underlying the …
abstract arxiv cs.ai cs.lg data deep rl divergence environment gradient learn network parameters reinforcement reinforcement learning samples show study type update updates value
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
C003549 Data Analyst (NS) - MON 13 May
@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium
Marketing Decision Scientist
@ Meta | Menlo Park, CA | New York City