April 18, 2024, 4:43 a.m. | Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

stat.ML updates on arXiv.org arxiv.org

arXiv:2312.08358v2 Announce Type: replace-cross
Abstract: In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This captures common issues of data collection, such as having human annotators with varied preferences, cognitive processes that result in seemingly irrational behavior, and combining data labeled according to different criteria. We prove that standard applications of …

accounting arxiv context cs.ai cs.lg hidden rlhf stat.ml type understanding

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York