all AI news
Learning to Constrain Policy Optimization with Virtual Trust Region. (arXiv:2204.09315v2 [cs.LG] UPDATED)
Sept. 19, 2022, 1:12 a.m. | Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh
cs.LG updates on arXiv.org arxiv.org
We introduce a constrained optimization method for policy gradient
reinforcement learning, which uses a virtual trust region to regulate each
policy update. In addition to using the proximity of one single old policy as
the normal trust region, we propose forming a second trust region through
another virtual policy representing a wide range of past policies. We then
enforce the new policy to stay closer to the virtual policy, which is
beneficial if the old policy performs poorly. More importantly, …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Engineer
@ Bosch Group | San Luis Potosí, Mexico
DATA Engineer (H/F)
@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)
Advisor, Data engineering
@ Desjardins | 1, Complexe Desjardins, Montréal
Data Engineer Intern
@ Getinge | Wayne, NJ, US
Software Engineer III- Java / Python / Pyspark / ETL
@ JPMorgan Chase & Co. | Jersey City, NJ, United States
Lead Data Engineer (Azure/AWS)
@ Telstra | Telstra ICC Bengaluru