Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence | allainews.com

March 12, 2024, 4:41 a.m. | Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.05996v1 Announce Type: new
Abstract: We show that deep reinforcement learning can maintain its ability to learn without resetting network parameters in settings where the number of gradient updates greatly exceeds the number of environment samples. Under such large update-to-data ratios, a recent study by Nikishin et al. (2022) suggested the emergence of a primacy bias, in which agents overfit early interactions and downplay later experience, impairing their ability to learn. In this work, we dissect the phenomena underlying the …

abstract arxiv cs.ai cs.lg data deep rl divergence environment gradient learn network parameters reinforcement reinforcement learning samples show study type update updates value

More from arxiv.org / cs.LG updates on arXiv.org

Course Recommender Systems Need to Consider the Job Market 5 hours ago | arxiv.org

abstract arxiv course cs.ir +16

$\texttt{immrax}$: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX 5 hours ago | arxiv.org

abstract analysis arxiv compilation +18

Thousands of AI Authors on the Future of AI 5 hours ago | arxiv.org

abstract advanced advanced ai ai progress +21

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 5 hours ago | arxiv.org

abstract analysis arxiv assessment +24

Volume-Preserving Transformers for Learning Time Series Data with Structure 5 hours ago | arxiv.org

abstract arxiv cs.lg cs.na +24

Eureka: Human-Level Reward Design via Coding Large Language Models 5 hours ago | arxiv.org

abstract algorithm arxiv bridge +25

Reconstruction of Unstable Heavy Particles Using Deep Symmetry-Preserving Attention Networks 5 hours ago | arxiv.org

abstract arxiv attention cs.lg +11

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search 5 hours ago | arxiv.org

abstract arxiv become compression +24

Gaussian random field approximation via Stein's method with applications to wide random neural networks 5 hours ago | arxiv.org

abstract applications approximation arxiv +14

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

View on ai-jobs.net

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net