Sept. 8, 2022, 1:11 a.m. | Frederik Schubert, Carolin Benjamins, Sebastian Döhler, Bodo Rosenhahn, Marius Lindauer

cs.LG updates on arXiv.org arxiv.org

The goal of Unsupervised Reinforcement Learning (URL) is to find a
reward-agnostic prior policy on a task domain, such that the sample-efficiency
on supervised downstream tasks is improved. Although agents initialized with
such a prior policy can achieve a significantly higher reward with fewer
samples when finetuned on the downstream task, it is still an open question how
an optimal pretrained prior policy can be achieved in practice. In this work,
we present POLTER (Policy Trajectory Ensemble Regularization) - a …

arxiv ensemble policy regularization reinforcement reinforcement learning unsupervised

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Enterprise Data Quality, Senior Analyst

@ Toyota North America | Plano

Data Analyst & Audit Management Software (AMS) Coordinator

@ World Vision | Philippines - Home Working

Product Manager Power BI Platform Tech I&E Operational Insights

@ ING | HBP (Amsterdam - Haarlerbergpark)

Sr. Director, Software Engineering, Clinical Data Strategy

@ Moderna | USA-Washington-Seattle-1099 Stewart Street

Data Engineer (Data as a Service)

@ Xplor | Atlanta, GA, United States