all AI news
Off-Policy Actor-Critic with Emphatic Weightings. (arXiv:2111.08172v3 [cs.LG] UPDATED)
cs.LG updates on arXiv.org arxiv.org
A variety of theoretically-sound policy gradient algorithms exist for the
on-policy setting due to the policy gradient theorem, which provides a
simplified form for the gradient. The off-policy setting, however, has been
less clear due to the existence of multiple objectives and the lack of an
explicit off-policy policy gradient theorem. In this work, we unify these
objectives into one off-policy objective, and provide a policy gradient theorem
for this unified objective. The derivation involves emphatic weightings and
interest functions. …
actor-critic algorithm algorithms arxiv derivation gradient multiple policy simplified sound strategies theorem work