all AI news
CogBench: a large language model walks into a psychology lab
Feb. 29, 2024, 5:42 a.m. | Julian Coda-Forno, Marcel Binz, Jane X. Wang, Eric Schulz
cs.LG updates on arXiv.org arxiv.org
Abstract: Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel approach offers a toolkit for phenotyping LLMs' behavior. We apply CogBench to 35 LLMs, yielding a rich and diverse dataset. We …
abstract advanced artificial artificial intelligence arxiv benchmark benchmarks cs.ai cs.cl cs.lg focus intelligence lab language language model language models large language large language model large language models llms metrics paper performance psychology them type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Principal, Product Strategy Operations, Cloud Data Analytics
@ Google | Sunnyvale, CA, USA; Austin, TX, USA
Data Scientist - HR BU
@ ServiceNow | Hyderabad, India