Jan. 17, 2022, 2:10 a.m. | Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan

cs.LG updates on arXiv.org arxiv.org

In recent years, deep off-policy actor-critic algorithms have become a
dominant approach to reinforcement learning for continuous control. One of the
primary drivers of this improved performance is the use of pessimistic value
updates to address function approximation errors, which previously led to
disappointing performance. However, a direct consequence of pessimism is
reduced exploration, running counter to theoretical support for the efficacy of
optimism in the face of uncertainty. So which approach is best? In this work,
we show that …

arxiv learning optimism reinforcement learning

Senior Marketing Data Analyst

@ Amazon.com | Amsterdam, North Holland, NLD

Senior Data Analyst

@ MoneyLion | Kuala Lumpur, Kuala Lumpur, Malaysia

Data Management Specialist - Office of the CDO - Chase- Associate

@ JPMorgan Chase & Co. | LONDON, LONDON, United Kingdom

BI Data Analyst

@ Nedbank | Johannesburg, ZA

Head of Data Science and Artificial Intelligence (m/f/d)

@ Project A Ventures | Munich, Germany

Senior Data Scientist - GenAI

@ Roche | Hyderabad RSS