Sept. 22, 2022, 1:11 a.m. | Audrey Huang, Liu Leqi, Zachary Chase Lipton, Kamyar Azizzadenesheli

cs.LG updates on arXiv.org arxiv.org

Addressing such diverse ends as safety alignment with human preferences, and
the efficiency of learning, a growing line of reinforcement learning research
focuses on risk functionals that depend on the entire distribution of returns.
Recent work on \emph{off-policy risk assessment} (OPRA) for contextual bandits
introduced consistent estimators for the target policy's CDF of returns along
with finite sample guarantees that extend to (and hold simultaneously over) all
risk. In this paper, we lift OPRA to Markov decision processes (MDPs), where …

arxiv decision markov policy processes risk risk assessment

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Integration Specialist

@ Accenture Federal Services | San Antonio, TX

Geospatial Data Engineer - Location Intelligence

@ Allegro | Warsaw, Poland

Site Autonomy Engineer (Onsite)

@ May Mobility | Tokyo, Japan

Summer Intern, AI (Artificial Intelligence)

@ Nextech Systems | Tampa, FL

Permitting Specialist/Wetland Scientist

@ AECOM | Chelmsford, MA, United States