all AI news
Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration. (arXiv:2206.02036v1 [cs.LG])
June 7, 2022, 1:11 a.m. | Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald
stat.ML updates on arXiv.org arxiv.org
Neural replicator dynamics (NeuRD) is an alternative to the foundational
softmax policy gradient (SPG) algorithm motivated by online learning and
evolutionary game theory. The NeuRD expected update is designed to be nearly
identical to that of SPG, however, we show that the Monte Carlo updates differ
in a substantial way: the importance correction accounting for a sampled action
is nullified in the SPG update, but not in the NeuRD update. Naturally, this
causes the NeuRD update to have higher variance …
More from arxiv.org / stat.ML updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Lead Software Engineer - Artificial Intelligence, LLM
@ OpenText | Hyderabad, TG, IN
Lead Software Engineer- Python Data Engineer
@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom
Data Analyst (m/w/d)
@ Collaboration Betters The World | Berlin, Germany
Data Engineer, Quality Assurance
@ Informa Group Plc. | Boulder, CO, United States
Director, Data Science - Marketing
@ Dropbox | Remote - Canada