Feb. 12, 2024, 5:46 a.m. | James Y. Huang Sailik Sengupta Daniele Bonadiman Yi-an Lai Arshit Gupta Nikolaos Pappas Saab Mansour

cs.CL updates on arXiv.org arxiv.org

Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model …

alignment cs.ai cs.cl current deal decoding feedback generate human human feedback language language models large language large language models llms reinforcement reinforcement learning rlhf through training work

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analyst-Data Analysis

@ Tesco Bengaluru | Bengaluru, India

Data Engineer - Senior Associate

@ PwC | Brussels

People Data Analyst

@ Version 1 | London, United Kingdom

Senior Data Scientist

@ Palta | Simple Cyprus or remote