all AI news
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
April 3, 2024, 4:46 a.m. | Philipp Mondorf, Barbara Plank
cs.CL updates on arXiv.org arxiv.org
Abstract: Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs' reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow accuracy metrics, rather than a thorough investigation of the models' reasoning behavior. This paper seeks to address this gap by providing …
abstract accuracy arxiv behavior beyond capabilities cs.ai cs.cl however humans language language models large language large language models llms performance reasoning survey tasks type uncertain uncertainty
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AI Engineering Manager
@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain