Web: http://arxiv.org/abs/2007.11684

June 24, 2022, 1:11 a.m. | Daniel Russo

cs.LG updates on arXiv.org arxiv.org

Folklore suggests that policy gradient can be more robust to misspecification
than its relative, approximate policy iteration. This paper studies the case of
state-aggregated representations, where the state space is partitioned and
either the policy or value function approximation is held constant over
partitions. This paper shows a policy gradient method converges to a policy
whose regret per-period is bounded by $\epsilon$, the largest difference
between two elements of the state-action value function belonging to a common
partition. With the …

approximation arxiv benefits gradient lg policy

