all AI news
Evaluating Language Model Agency through Negotiations
March 19, 2024, 4:45 a.m. | Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West
cs.LG updates on arXiv.org arxiv.org
Abstract: We introduce an approach to evaluate language model (LM) agency using negotiation games. This approach better reflects real-world use cases and addresses some of the shortcomings of alternative LM benchmarks. Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage. We use our approach to test six widely used and publicly accessible LMs, evaluating performance and alignment in both self-play and cross-play settings. Noteworthy findings include: (i) …
abstract agency arxiv benchmarks cases complexity cs.ai cs.cl cs.lg data data leakage evaluation games interactions language language model modulate negotiation negotiations study through type use cases world
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Data Scientist, gTech Ads
@ Google | Mexico City, CDMX, Mexico
Lead, Data Analytics Operations
@ Zocdoc | Pune, Maharashtra, India