all AI news
Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games. (arXiv:2208.09452v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
This paper addresses policy learning in non-stationary environments and games
with continuous actions. Rather than the classical reward maximization
mechanism, inspired by the ideas of follow-the-regularized-leader (FTRL) and
mirror descent (MD) update, we propose a no-regret style reinforcement learning
algorithm PORL for continuous action tasks. We prove that PORL has a
last-iterate convergence guarantee, which is important for adversarial and
cooperative games. Empirical studies show that, in stationary environments such
as MuJoCo locomotion controlling tasks, PORL performs equally well as, …
arxiv continuous games learning lg optimization policy reinforcement reinforcement learning