all AI news
Off-Policy Risk Assessment in Markov Decision Processes. (arXiv:2209.10444v1 [cs.LG])
Sept. 22, 2022, 1:11 a.m. | Audrey Huang, Liu Leqi, Zachary Chase Lipton, Kamyar Azizzadenesheli
cs.LG updates on arXiv.org arxiv.org
Addressing such diverse ends as safety alignment with human preferences, and
the efficiency of learning, a growing line of reinforcement learning research
focuses on risk functionals that depend on the entire distribution of returns.
Recent work on \emph{off-policy risk assessment} (OPRA) for contextual bandits
introduced consistent estimators for the target policy's CDF of returns along
with finite sample guarantees that extend to (and hold simultaneously over) all
risk. In this paper, we lift OPRA to Markov decision processes (MDPs), where …
More from arxiv.org / cs.LG updates on arXiv.org
Generalized Schr\"odinger Bridge Matching
1 day, 11 hours ago |
arxiv.org
Tight bounds on Pauli channel learning without entanglement
1 day, 11 hours ago |
arxiv.org
Automated mapping of virtual environments with visual predictive coding
1 day, 11 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Integration Specialist
@ Accenture Federal Services | San Antonio, TX
Geospatial Data Engineer - Location Intelligence
@ Allegro | Warsaw, Poland
Site Autonomy Engineer (Onsite)
@ May Mobility | Tokyo, Japan
Summer Intern, AI (Artificial Intelligence)
@ Nextech Systems | Tampa, FL
Permitting Specialist/Wetland Scientist
@ AECOM | Chelmsford, MA, United States