Feb. 21, 2024, 5:42 a.m. | Maksim Bobrin, Nazar Buzun, Dmitrii Krylov, Dmitry V. Dylov

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.13037v1 Announce Type: new
Abstract: Offline reinforcement learning (RL) addresses the problem of sequential decision-making by learning optimal policy through pre-collected data, without interacting with the environment. As yet, it has remained somewhat impractical, because one rarely knows the reward explicitly and it is hard to distill it retrospectively. Here, we show that an imitating agent can still learn the desired behavior merely from observing the expert, despite the absence of explicit rewards or action labels. In our method, AILOT …

abstract arxiv cs.ai cs.lg data decision environment imitation learning making offline policy reinforcement reinforcement learning the environment through transport type via

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States