all AI news
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
May 27, 2024, 4:49 a.m. | Boshi Wang, Xiang Yue, Yu Su, Huan Sun
cs.CL updates on arXiv.org arxiv.org
Abstract: We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for …
abstract arxiv comparison cs.cl edge journey knowledge language language models learn parametric reason reasoning skill struggle study the edge transformers type types
More from arxiv.org / cs.CL updates on arXiv.org
ReFT: Reasoning with Reinforced Fine-Tuning
2 days, 10 hours ago |
arxiv.org
Exploring Defeasibility in Causal Reasoning
2 days, 10 hours ago |
arxiv.org
A Large Language Model Approach to Educational Survey Feedback Analysis
2 days, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Scientist
@ Ford Motor Company | Chennai, Tamil Nadu, India
Systems Software Engineer, Graphics
@ Parallelz | Vancouver, British Columbia, Canada - Remote
Engineering Manager - Geo Engineering Team (F/H/X)
@ AVIV Group | Paris, France
Data Analyst
@ Microsoft | San Antonio, Texas, United States
Azure Data Engineer
@ TechVedika | Hyderabad, India
Senior Data & AI Threat Detection Researcher (Cortex)
@ Palo Alto Networks | Tel Aviv-Yafo, Israel