March 4, 2022, 2:12 a.m. | Lukas P. Fröhlich, Maksym Lefarov, Melanie N. Zeilinger, Felix Berkenkamp

cs.LG updates on arXiv.org arxiv.org

Model-free reinforcement learning algorithms can compute policy gradients
given sampled environment transitions, but require large amounts of data. In
contrast, model-based methods can use the learned model to generate new data,
but model errors and bias can render learning unstable or suboptimal. In this
paper, we present a novel method that combines real-world data and a learned
model in order to get the best of both worlds. The core idea is to exploit the
real-world data for on-policy predictions and …

arxiv errors learning policy reinforcement reinforcement learning

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

AI Scientist/Engineer

@ OKX | Singapore

Research Engineering/ Scientist Associate I

@ The University of Texas at Austin | AUSTIN, TX

Senior Data Engineer

@ Algolia | London, England

Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)

@ BlackRock | NY7 - 50 Hudson Yards, New York

Snowflake Data Analytics

@ Devoteam | Madrid, Spain