all AI news
Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training. (arXiv:2209.01054v1 [cs.MA])
cs.LG updates on arXiv.org arxiv.org
Centralised training (CT) is the basis for many popular multi-agent
reinforcement learning (MARL) methods because it allows agents to quickly learn
high-performing policies. However, CT relies on agents learning from one-off
observations of other agents' actions at a given state. Because MARL agents
explore and update their policies during training, these observations often
provide poor predictions about other agents' behaviour and the expected return
for a given action. CT methods therefore suffer from high variance and
error-prone estimates, harming learning. …
arxiv centralised embedded policy reinforcement reinforcement learning training