all AI news
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
April 15, 2024, 4:47 a.m. | Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul R\"ottger, Barbara Plank
cs.CL updates on arXiv.org arxiv.org
Abstract: Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, and that first token probabilities do not match text answers for …
abstract arxiv capabilities cs.ai cs.cl instruction-tuned language language models large language large language models llms look multiple probability questions robust text think token type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Data Analyst
@ Alstom | Johannesburg, GT, ZA