April 25, 2024, 5:44 p.m. | Jo\~ao Monteiro, \'Etienne Marcotte, Pierre-Andr\'e No\"el, Valentina Zantedeschi, David V\'azquez, Nicolas Chapados, Christopher Pal, Perouz Taslakia

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.15420v1 Announce Type: new
Abstract: In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the …

abstract arxiv attention cache caching context cost cs.ai cs.cl decoder however in-context learning inference information language language model llm operations processing prompting reference self-attention space transformer type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Scientist (Database Development)

@ Nasdaq | Bengaluru-Affluence