all AI news
Retrieval Head Mechanistically Explains Long-Context Factuality
April 25, 2024, 5:44 p.m. | Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu
cs.CL updates on arXiv.org arxiv.org
Abstract: Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing properties of retrieval heads:(1) universal: all the explored models …
abstract arxiv capability context cs.cl head information investigation language language models locations paper progress question retrieval spectrum transformer transformer-based models type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US