April 25, 2024, 5:44 p.m. | Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.15574v1 Announce Type: new
Abstract: Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing properties of retrieval heads:(1) universal: all the explored models …

abstract arxiv capability context cs.cl head information investigation language language models locations paper progress question retrieval spectrum transformer transformer-based models type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne