April 8, 2024, 4:46 a.m. | Zhangdie Yuan, Chenxi Whitehouse, Eric Chamoun, Rami Aly, Andreas Vlachos

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.03818v1 Announce Type: new
Abstract: This paper introduces PRobELM (Plausibility Ranking Evaluation for Language Models), a benchmark designed to assess language models' ability to discern more plausible from less plausible scenarios through their parametric knowledge. While benchmarks such as TruthfulQA emphasise factual accuracy or truthfulness, and others such as COPA explore plausible scenarios without explicitly incorporating world knowledge, PRobELM seeks to bridge this gap by evaluating models' capabilities to prioritise plausible scenarios that leverage world knowledge over less plausible alternatives. …

abstract accuracy arxiv benchmark benchmarks cs.cl evaluation explore knowledge language language models paper parametric ranking through type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne