Web: http://arxiv.org/abs/2205.05230

May 12, 2022, 1:11 a.m. | Jordan Erskine, Chris Lehnert

cs.LG updates on arXiv.org arxiv.org

Many hierarchical reinforcement learning algorithms utilise a series of
independent skills as a basis to solve tasks at a higher level of reasoning.
These algorithms don't consider the value of using skills that are cooperative
instead of independent. This paper proposes the Cooperative Consecutive
Policies (CCP) method of enabling consecutive agents to cooperatively solve
long time horizon multi-stage tasks. This method is achieved by modifying the
policy of each agent to maximise both the current and next agent's critic.
Cooperatively …

arxiv learning reinforcement reinforcement learning stage

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC