Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization | allainews.com

May 27, 2024, 4:49 a.m. | Boshi Wang, Xiang Yue, Yu Su, Huan Sun

cs.CL updates on arXiv.org arxiv.org

arXiv:2405.15071v1 Announce Type: new
Abstract: We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for …

abstract arxiv comparison cs.cl edge journey knowledge language language models learn parametric reason reasoning skill struggle study the edge transformers type types

More from arxiv.org / cs.CL updates on arXiv.org

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector 2 days, 10 hours ago | arxiv.org

abstract arxiv audio cs.cl +22

Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? 2 days, 10 hours ago | arxiv.org

abstract adapt arxiv communication +23

ReFT: Reasoning with Reinforced Fine-Tuning 2 days, 10 hours ago | arxiv.org

abstract annotations arxiv capability +22

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability 2 days, 10 hours ago | arxiv.org

abstract accuracy arxiv cs.cl +13

Exploring Defeasibility in Causal Reasoning 2 days, 10 hours ago | arxiv.org

abstract arxiv causal causal reasoning +7

Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial … 2 days, 10 hours ago | arxiv.org

abstract annotation arxiv capacity +26

Theory of Mind for Multi-Agent Collaboration via Large Language Models 2 days, 10 hours ago | arxiv.org

abstract agent agents arxiv +28

Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement 2 days, 10 hours ago | arxiv.org

arxiv cs.ai cs.cl focus +12

A Large Language Model Approach to Educational Survey Feedback Analysis 2 days, 10 hours ago | arxiv.org

abstract analysis arxiv capabilities +27

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India

View on ai-jobs.net

Systems Software Engineer, Graphics

@ Parallelz | Vancouver, British Columbia, Canada - Remote

View on ai-jobs.net

Engineering Manager - Geo Engineering Team (F/H/X)

@ AVIV Group | Paris, France

View on ai-jobs.net

Data Analyst

@ Microsoft | San Antonio, Texas, United States

View on ai-jobs.net

Azure Data Engineer

@ TechVedika | Hyderabad, India

View on ai-jobs.net

Senior Data & AI Threat Detection Researcher (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel

View on ai-jobs.net