all AI news
POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning. (arXiv:2205.11357v2 [cs.LG] UPDATED)
Sept. 8, 2022, 1:11 a.m. | Frederik Schubert, Carolin Benjamins, Sebastian Döhler, Bodo Rosenhahn, Marius Lindauer
cs.LG updates on arXiv.org arxiv.org
The goal of Unsupervised Reinforcement Learning (URL) is to find a
reward-agnostic prior policy on a task domain, such that the sample-efficiency
on supervised downstream tasks is improved. Although agents initialized with
such a prior policy can achieve a significantly higher reward with fewer
samples when finetuned on the downstream task, it is still an open question how
an optimal pretrained prior policy can be achieved in practice. In this work,
we present POLTER (Policy Trajectory Ensemble Regularization) - a …
arxiv ensemble policy regularization reinforcement reinforcement learning unsupervised
More from arxiv.org / cs.LG updates on arXiv.org
Regularization by Texts for Latent Diffusion Inverse Solvers
1 day, 20 hours ago |
arxiv.org
When can transformers reason with abstract symbols?
1 day, 20 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Quality, Senior Analyst
@ Toyota North America | Plano
Data Analyst & Audit Management Software (AMS) Coordinator
@ World Vision | Philippines - Home Working
Product Manager Power BI Platform Tech I&E Operational Insights
@ ING | HBP (Amsterdam - Haarlerbergpark)
Sr. Director, Software Engineering, Clinical Data Strategy
@ Moderna | USA-Washington-Seattle-1099 Stewart Street
Data Engineer (Data as a Service)
@ Xplor | Atlanta, GA, United States