all AI news
Aligning language models with human preferences
April 19, 2024, 4:41 a.m. | Tomasz Korbak
cs.LG updates on arXiv.org arxiv.org
Abstract: Language models (LMs) trained on vast quantities of text data can acquire sophisticated skills such as generating summaries, answering questions or generating code. However, they also manifest behaviors that violate human preferences, e.g., they can generate offensive content, falsehoods or perpetuate social biases. In this thesis, I explore several approaches to aligning LMs with human preferences. First, I argue that aligning LMs can be seen as Bayesian inference: conditioning a prior (base, pretrained LM) on …
abstract arxiv biases code cs.cl cs.lg data explore generate however human language language models lms manifest questions skills social text thesis type vast
More from arxiv.org / cs.LG updates on arXiv.org
Training robust and generalizable quantum models
57 seconds ago |
arxiv.org
Causal Discovery Under Local Privacy
58 seconds ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Consultant Senior Power BI & Azure - CDI - H/F
@ Talan | Lyon, France