Sept. 19, 2022, 1:12 a.m. | Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh

cs.LG updates on arXiv.org arxiv.org

We introduce a constrained optimization method for policy gradient
reinforcement learning, which uses a virtual trust region to regulate each
policy update. In addition to using the proximity of one single old policy as
the normal trust region, we propose forming a second trust region through
another virtual policy representing a wide range of past policies. We then
enforce the new policy to stay closer to the virtual policy, which is
beneficial if the old policy performs poorly. More importantly, …

arxiv optimization policy trust virtual

Data Engineer

@ Bosch Group | San Luis Potosí, Mexico

DATA Engineer (H/F)

@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)

Advisor, Data engineering

@ Desjardins | 1, Complexe Desjardins, Montréal

Data Engineer Intern

@ Getinge | Wayne, NJ, US

Software Engineer III- Java / Python / Pyspark / ETL

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

Lead Data Engineer (Azure/AWS)

@ Telstra | Telstra ICC Bengaluru