Web: http://arxiv.org/abs/2108.05828

Jan. 12, 2022, 2:11 a.m. | Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

cs.LG updates on arXiv.org arxiv.org

Common policy gradient methods rely on the maximization of a sequence of
surrogate functions. In recent years, many such surrogate functions have been
proposed, most without strong theoretical guarantees, leading to algorithms
such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we
instead propose a general framework (FMA-PG) based on functional mirror ascent
that gives rise to an entire family of surrogate functions. We construct
surrogate functions that enable policy improvement guarantees, a property not
shared by most existing surrogate functions. Crucially, these guarantees hold
regardless …

arxiv for learning reinforcement learning

Statistics and Computer Science Specialist

@ Hawk-Research | Remote

Data Scientist, Credit/Fraud Strategy

@ Fora Financial | New York City

Postdoctoral Research Associate - Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory - Oak Ridge, TN | Oak Ridge, TN, United States

Senior Machine Learning / Computer Vision Engineer

@ Glass Imaging | Los Altos, CA

Research Scientist in Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory | Oak Ridge, TN

W3-Professorship for Intelligent Energy Management

@ Universität Bayreuth | Bayreuth, Germany