March 22, 2024, 4:42 a.m. | Kimon Protopapas, Anas Barakat

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.14156v1 Announce Type: new
Abstract: Policy Mirror Descent (PMD) stands as a versatile algorithmic framework encompassing several seminal policy gradient algorithms such as natural policy gradient, with connections with state-of-the-art reinforcement learning (RL) algorithms such as TRPO and PPO. PMD can be seen as a soft Policy Iteration algorithm implementing regularized 1-step greedy policy improvement. However, 1-step greedy policies might not be the best choice and recent remarkable empirical successes in RL such as AlphaGo and AlphaZero have demonstrated that …

abstract algorithm algorithms art arxiv cs.ai cs.lg framework gradient improvement iteration natural policy ppo reinforcement reinforcement learning state stat.ml type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA