Web: http://arxiv.org/abs/2205.02976

May 9, 2022, 1:11 a.m. | Hua Zheng, Wei Xie

cs.LG updates on arXiv.org arxiv.org

We extend the idea underlying the success of green simulation assisted policy
gradient (GS-PG) to partial historical trajectory reuse for infinite-horizon
Markov Decision Processes (MDP). The existing GS-PG method was designed to
learn from complete episodes or process trajectories, which limits its
applicability to low-data environment and online process control. In this
paper, the mixture likelihood ratio (MLR) based policy gradient estimation is
used to leverage the information from historical state decision transitions
generated under different behavioral policies. We propose …

arxiv gradient optimization policy variance

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California