Feb. 24, 2022, 2:11 a.m. | Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang

cs.LG updates on arXiv.org arxiv.org

Offline Reinforcement Learning (RL) aims to learn policies from previously
collected datasets without exploring the environment. Directly applying
off-policy algorithms to offline RL usually fails due to the extrapolation
error caused by the out-of-distribution (OOD) actions. Previous methods tackle
such problem by penalizing the Q-values of OOD actions or constraining the
trained policy to be close to the behavior policy. Nevertheless, such methods
typically prevent the generalization of value functions beyond the offline data
and also lack precise characterization of …

arxiv bootstrapping learning reinforcement reinforcement learning uncertainty

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Senior Product Manager - Real-Time Payments Risk AI & Analytics

@ Visa | London, United Kingdom

Business Analyst (AI Industry)

@ SmartDev | Cầu Giấy, Vietnam

Computer Vision Engineer

@ Sportradar | Mont-Saint-Guibert, Belgium

Data Analyst

@ Unissant | Alexandria, VA, USA

Senior Applied Scientist

@ Zillow | Remote-USA