all AI news
Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes. (arXiv:2206.01011v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Policy gradient (PG) is a reinforcement learning (RL) approach that optimizes
a parameterized policy model for an expected return using gradient ascent.
Given a well-parameterized policy model, such as a neural network model, with
appropriate initial parameters, the PG algorithms work well even when
environment does not have the Markov property. Otherwise, they can be trapped
on a plateau or suffer from peakiness effects. As another successful RL
approach, algorithms based on Monte-Carlo Tree Search (MCTS), which include
AlphaZero, have …
algorithms arxiv decision gradient markov monte-carlo policy processes search tree