June 11, 2024, 4:48 a.m. | Kai Yan, Alexander G. Schwing, Yu-xiong Wang

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.01331v3 Announce Type: replace
Abstract: In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods, as exemplified by SMODICE, minimize the state occupancy divergence between the learner's and empirical expert policies. However, …

arxiv cs.ai cs.lg observation offline primal replace state type via

Senior Data Engineer

@ Displate | Warsaw

Principal Architect

@ eSimplicity | Silver Spring, MD, US

Embedded Software Engineer

@ Carrier | CAN03: Carrier-Charlotte, NC 9701 Old Statesville Road, Charlotte, NC, 28269 USA

(USA) Software Engineer III

@ Roswell Park Comprehensive Cancer Center | (USA) CA SUNNYVALE Home Office SUNNYVALE III - 840 W CALIFORNIA

Experienced Manufacturing and Automation Engineer

@ Boeing | DEU - Munich, Germany

Software Engineering-Sr Engineer (Java 17, Python, Microservices, Spring Boot, REST)

@ FICO | Bengaluru, India