Jan. 28, 2022, 2:10 a.m. | Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

cs.LG updates on arXiv.org arxiv.org

Reward-free reinforcement learning (RL) considers the setting where the agent
does not have access to a reward function during exploration, but must propose
a near-optimal policy for an arbitrary reward function revealed only after
exploring. In the the tabular setting, it is well known that this is a more
difficult problem than PAC RL -- where the agent has access to the reward
function during exploration -- with optimal sample complexities in the two
settings differing by a factor of …

arxiv decision markov processes rl

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

AI Scientist/Engineer

@ OKX | Singapore

Research Engineering/ Scientist Associate I

@ The University of Texas at Austin | AUSTIN, TX

Senior Data Engineer

@ Algolia | London, England

Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)

@ BlackRock | NY7 - 50 Hudson Yards, New York

Snowflake Data Analytics

@ Devoteam | Madrid, Spain