Web: http://arxiv.org/abs/2007.11684

June 24, 2022, 1:11 a.m. | Daniel Russo

cs.LG updates on arXiv.org arxiv.org

Folklore suggests that policy gradient can be more robust to misspecification
than its relative, approximate policy iteration. This paper studies the case of
state-aggregated representations, where the state space is partitioned and
either the policy or value function approximation is held constant over
partitions. This paper shows a policy gradient method converges to a policy
whose regret per-period is bounded by $\epsilon$, the largest difference
between two elements of the state-action value function belonging to a common
partition. With the …

approximation arxiv benefits gradient lg policy

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY