June 19, 2024, 4:45 a.m. | Zhirui Chen, Vincent Y. F. Tan

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.12205v1 Announce Type: new
Abstract: We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \underline{W}eights or {\sc RL-LOW}, which yields a simple regret of $\exp ( - \Omega(n/H) )$ where $n$ is the number …

abstract action arxiv cs.ai cs.it cs.lg dataset feedback function instance linear math.it math.st offline reinforcement reinforcement learning state stat.ml stat.th type

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA