Jan. 21, 2022, 2:11 a.m. | Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

cs.LG updates on arXiv.org arxiv.org

The standard formulation of Reinforcement Learning lacks a practical way of
specifying what are admissible and forbidden behaviors. Most often,
practitioners go about the task of behavior specification by manually
engineering the reward function, a counter-intuitive process that requires
several iterations and is prone to reward hacking by the agent. In this work,
we argue that constrained RL, which has almost exclusively been used for safe
RL, also has the potential to significantly reduce the amount of work spent for …

arxiv learning reinforcement learning

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer

@ Parker | New York City

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC