Feb. 19, 2024, 5:43 a.m. | Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael L. Littman

cs.LG updates on arXiv.org arxiv.org

arXiv:2212.03733v2 Announce Type: replace
Abstract: Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such behavior swiftly. In this work, we consider the reward-design problem in tasks formulated as reaching desirable states and avoiding undesirable states. To start, we propose a strict partial ordering of the policy space to resolve trade-offs in behavior preference. We …

abstract agent agents arxiv behavior cs.ai cs.lg design environmental express functions humans interactions job learn process reinforcement signal through type work

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York