June 15, 2022, 1:11 a.m. | Florent Delgrange, Ann Nowé, Guillermo A. Pérez

cs.LG updates on arXiv.org arxiv.org

We consider the challenge of policy simplification and verification in the
context of policies learned through reinforcement learning (RL) in continuous
environments. In well-behaved settings, RL algorithms have convergence
guarantees in the limit. While these guarantees are valuable, they are
insufficient for safety-critical applications. Furthermore, they are lost when
applying advanced techniques such as deep-RL. To recover guarantees when
applying advanced RL algorithms to more complex environments with (i)
reachability, (ii) safety-constrained reachability, or (iii) discounted-reward
objectives, we build upon …

arxiv decision distillation lg markov processes report rl technical

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US