all AI news
Retrieval Head Mechanistically Explains Long-Context Factuality
April 25, 2024, 5:44 p.m. | Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu
cs.CL updates on arXiv.org arxiv.org
Abstract: Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing properties of retrieval heads:(1) universal: all the explored models …
abstract arxiv capability context cs.cl head information investigation language language models locations paper progress question retrieval spectrum transformer transformer-based models type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne