all AI news
The Phenomenon of Policy Churn. (arXiv:2206.00730v3 [cs.LG] UPDATED)
Oct. 24, 2022, 1:14 a.m. | Tom Schaul, André Barreto, John Quan, Georg Ostrovski
stat.ML updates on arXiv.org arxiv.org
We identify and study the phenomenon of policy churn, that is, the rapid
change of the greedy policy in value-based reinforcement learning. Policy churn
operates at a surprisingly rapid pace, changing the greedy action in a large
fraction of states within a handful of learning updates (in a typical deep RL
set-up such as DQN on Atari). We characterise the phenomenon empirically,
verifying that it is not limited to specific algorithm or environment
properties. A number of ablations help whittle …
More from arxiv.org / stat.ML updates on arXiv.org
Learning linear dynamical systems under convex constraints
1 day, 6 hours ago |
arxiv.org
Inverse Unscented Kalman Filter
2 days, 6 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne