Web: https://www.reddit.com/r/reinforcementlearning/comments/umfeox/ppo_selfplay_probability_sampling_instead_of/

May 10, 2022, 10:17 a.m. | /u/ghostworld073

Reinforcement Learning reddit.com


I read a paper in which they use PPO to learn a game with one opponent.
They only use the experiences of one agent to update the network and use the same network for both agents.

My question is why they are using probability sampling for the opponent's actions and not the action with the highest probability since we want our opponent to use the best action for a given state?

And wouldn't it always be better to train …

probability reinforcementlearning sampling

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC