Web: http://arxiv.org/abs/2110.09712

June 23, 2022, 1:11 a.m. | Sicen Li, Qinyun Tang, Yiming Pang, Xinmeng Ma, Gang Wang

cs.LG updates on arXiv.org arxiv.org

This paper proposes a reinforcement learning framework to enhance the
exploration-exploitation trade-off by learning a range of policies concerning
various confidence bounds. The underestimated values provide stable updates but
suffer from inefficient exploration behaviors. On the other hand, overestimated
values can help the agent escape local optima, but it might cause
over-exploration on low-value areas and function approximation errors
accumulation. Algorithms have been proposed to mitigate the above
contradiction. However, we lack an understanding of how the value bias impact …

actor-critic arxiv framework lg value

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY