Evaluating the Deductive Competence of Large Language Models | allainews.com

April 16, 2024, 4:51 a.m. | Spencer M. Seals, Valerie L. Shalin

cs.CL updates on arXiv.org arxiv.org

arXiv:2309.05452v2 Announce Type: replace
Abstract: The development of highly fluent large language models (LLMs) has prompted increased interest in assessing their reasoning and problem-solving capabilities. We investigate whether several LLMs can solve a classic type of deductive reasoning problem from the cognitive science literature. The tested LLMs have limited abilities to solve these problems in their conventional form. We performed follow up experiments to investigate if changes to the presentation format and content improve model performance. We do find performance …

abstract arxiv capabilities cognitive cognitive science cs.cl development language language models large language large language models literature llms problem-solving reasoning science solve type

More from arxiv.org / cs.CL updates on arXiv.org

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models 14 hours ago | arxiv.org

abstract arxiv become contents +17

Temporal Knowledge Question Answering via Abstract Reasoning Induction 14 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +8

Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning 14 hours ago | arxiv.org

abstract application arxiv capabilities +19

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base 14 hours ago | arxiv.org

abstract arxiv cognitive cs.ai +23

FOLIO: Natural Language Reasoning with First-Order Logic 14 hours ago | arxiv.org

abstract arxiv benchmarks capabilities +21

Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks 14 hours ago | arxiv.org

arxiv attention attention mechanisms cs.cl +6

SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks 14 hours ago | arxiv.org

abstract arxiv capabilities communities +17

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers 14 hours ago | arxiv.org

abstract academia accessibility advances +28

COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain 14 hours ago | arxiv.org

abstract advanced art artificial +25

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net