Dissociation of Faithful and Unfaithful Reasoning in LLMs | allainews.com

May 27, 2024, 4:49 a.m. | Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen

cs.CL updates on arXiv.org arxiv.org

arXiv:2405.15092v1 Announce Type: cross
Abstract: Large language models (LLMs) improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. Our research investigates how LLMs recover from errors in Chain of Thought, reaching the correct final answer despite mistakes in the reasoning text. Through analysis of these error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, but we also identify many clear examples of faithful error recovery behaviors. We identify …

abstract analysis arxiv chain of thought cs.ai cs.cl errors generate language language models large language large language models llms mistakes performance reasoning research tasks text thought through type

More from arxiv.org / cs.CL updates on arXiv.org

Understanding Inter-Session Intentions via Complex Logical Reasoning 11 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +20

LUNA: A Model-Based Universal Analysis Framework for Large Language Models 11 hours ago | arxiv.org

abstract academic analysis applications +25

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers 11 hours ago | arxiv.org

arxiv cs.cl cs.cv ensemble +7

Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding 11 hours ago | arxiv.org

arxiv cs.cl cs.cv cs.hc +6

Investigating writing style as a contributor to gender gaps in science and technology 11 hours ago | arxiv.org

abstract article arxiv author +18

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base … 11 hours ago | arxiv.org

abstract arxiv cs.cl facts +18

Cross-Subject Data Splitting for Brain-to-Text Decoding 11 hours ago | arxiv.org

abstract arxiv brain brain signals +24

Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language … 11 hours ago | arxiv.org

abstract arxiv auto bert +16

GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives 11 hours ago | arxiv.org

abstract analysis annotation annotations +21

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Lead Python Developer - Generative AI

@ S&P Global | US - TX - VIRTUAL

View on ai-jobs.net

Analytics Engineer - Design Experience

@ Canva | Sydney, Australia

View on ai-jobs.net

Data Architect

@ Unisys | Bengaluru - RGA Tech Park

View on ai-jobs.net

Data Architect

@ HP | PSR01 - Bengaluru, Pritech Park- SEZ (PSR01)

View on ai-jobs.net

Streetlight Analyst

@ DTE Energy | Belleville, MI, US

View on ai-jobs.net