Aug. 10, 2023, 4:43 a.m. | Leo Benac, Sonali Parbhoo, Finale Doshi-Velez

cs.LG updates on arXiv.org arxiv.org

Offline Reinforcement learning is commonly used for sequential
decision-making in domains such as healthcare and education, where the rewards
are known and the transition dynamics $T$ must be estimated on the basis of
batch data. A key challenge for all tasks is how to learn a reliable estimate
of the transition dynamics $T$ that produce near-optimal policies that are safe
enough so that they never take actions that are far away from the best action
with respect to their value …

arxiv bayesian challenge data decision domains dynamics education healthcare how to learn learn making offline reinforcement reinforcement learning transition

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York