Web: http://arxiv.org/abs/2204.09315

Sept. 19, 2022, 1:12 a.m. | Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh

cs.LG updates on arXiv.org arxiv.org

We introduce a constrained optimization method for policy gradient
reinforcement learning, which uses a virtual trust region to regulate each
policy update. In addition to using the proximity of one single old policy as
the normal trust region, we propose forming a second trust region through
another virtual policy representing a wide range of past policies. We then
enforce the new policy to stay closer to the virtual policy, which is
beneficial if the old policy performs poorly. More importantly, …

arxiv optimization policy trust virtual

More from arxiv.org / cs.LG updates on arXiv.org

Research Scientists

@ ODU Research Foundation | Norfolk, Virginia

Embedded Systems Engineer (Robotics)

@ Neo Cybernetica | Bedford, New Hampshire

2023 Luis J. Alvarez and Admiral Grace M. Hopper Postdoc Fellowship in Computing Sciences

@ Lawrence Berkeley National Lab | San Francisco, CA

Senior Manager Data Scientist

@ NAV | Remote, US

Senior AI Research Scientist

@ Earth Species Project | Remote anywhere

Research Fellow- Center for Security and Emerging Technology (Multiple Opportunities)

@ University of California Davis | Washington, DC

Staff Fellow - Data Scientist

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Staff Fellow - Senior Data Engineer

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Senior Research Engineer, Applied Language

@ DeepMind | Mountain View, California, US

Machine Learning Engineer

@ Bluevine | Austin, TX

Lead Manager - Analytics & Data Science

@ Tide | India(Remote)

Machine Learning Engineer

@ Gtmhub | Indore, Madhya Pradesh, India