Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration. (arXiv:2206.02036v1 [cs.LG]) | allainews.com

June 7, 2022, 1:11 a.m. | Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald

stat.ML updates on arXiv.org arxiv.org

Neural replicator dynamics (NeuRD) is an alternative to the foundational
softmax policy gradient (SPG) algorithm motivated by online learning and
evolutionary game theory. The NeuRD expected update is designed to be nearly
identical to that of SPG, however, we show that the Monte Carlo updates differ
in a substantial way: the importance correction accounting for a sampled action
is nullified in the SPG update, but not in the NeuRD update. Naturally, this
causes the NeuRD update to have higher variance …

arxiv dynamics exploration gradient policy

More from arxiv.org / stat.ML updates on arXiv.org

Simultaneous upper and lower bounds of American option prices with hedging via neural networks 1 day, 9 hours ago | arxiv.org

abstract arxiv form math.pr +11

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF 2 days, 9 hours ago | arxiv.org

accounting arxiv context cs.ai +6

Hacking Task Confounder in Meta-Learning 2 days, 9 hours ago | arxiv.org

abstract arxiv cs.lg hacking +12

Reflection coupling for unadjusted generalized Hamiltonian Monte Carlo in the nonconvex stochastic gradient case 2 days, 9 hours ago | arxiv.org

abstract algorithms arxiv case +10

Provable Reward-Agnostic Preference-Based Reinforcement Learning 2 days, 9 hours ago | arxiv.org

abstract agent arxiv cs.ai +16

Mastering Diverse Domains through World Models 2 days, 9 hours ago | arxiv.org

abstract algorithm algorithms application +22

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models 2 days, 9 hours ago | arxiv.org

abstract arxiv cs.it cs.lg +14

Additive Covariance Matrix Models: Modelling Regional Electricity Net-Demand in Great Britain 2 days, 9 hours ago | arxiv.org

abstract arxiv britain consumption +18

Learning Algorithm Generalization Error Bounds via Auxiliary Distributions 2 days, 9 hours ago | arxiv.org

abstract algorithm arxiv cs.it +16

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Lead Software Engineer - Artificial Intelligence, LLM

@ OpenText | Hyderabad, TG, IN

View on ai-jobs.net

Lead Software Engineer- Python Data Engineer

@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom

View on ai-jobs.net

Data Analyst (m/w/d)

@ Collaboration Betters The World | Berlin, Germany

View on ai-jobs.net

Data Engineer, Quality Assurance

@ Informa Group Plc. | Boulder, CO, United States

View on ai-jobs.net

Director, Data Science - Marketing

@ Dropbox | Remote - Canada

View on ai-jobs.net