Web: http://arxiv.org/abs/2201.11206

Jan. 28, 2022, 2:10 a.m. | Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

cs.LG updates on arXiv.org arxiv.org

Reward-free reinforcement learning (RL) considers the setting where the agent
does not have access to a reward function during exploration, but must propose
a near-optimal policy for an arbitrary reward function revealed only after
exploring. In the the tabular setting, it is well known that this is a more
difficult problem than PAC RL -- where the agent has access to the reward
function during exploration -- with optimal sample complexities in the two
settings differing by a factor of …

arxiv decision markov processes rl

More from arxiv.org / cs.LG updates on arXiv.org

Data Analytics and Technical support Lead

@ Coupa Software, Inc. | Bogota, Colombia

Data Science Manager

@ Vectra | San Jose, CA

Data Analyst Sr

@ Capco | Brazil - Sao Paulo

Data Scientist (NLP)

@ Builder.ai | London, England, United Kingdom - Remote

Senior Data Analyst

@ BuildZoom | Scottsdale, AZ/ San Francisco, CA/ Remote

Senior Research Scientist, Speech Recognition

@ SoundHound Inc. | Toronto, Canada