all AI news
Dodo: Dynamic Contextual Compression for Decoder-only LMs
June 14, 2024, 4:42 a.m. | Guanghui Qin, Corby Rosset, Ethan C. Chau, Nikhil Rao, Benjamin Van Durme
cs.CL updates on arXiv.org arxiv.org
Abstract: Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adapted to Dodo by efficient parameter tuning methods such as LoRA. In use, …
abstract arxiv attention compression context cost cs.ai cs.cl cs.lg decoder dynamic hidden language language models layer lms per replace self-attention solution standard text token transformer transformer model type vector
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Senior Principal Software Engineer
@ Oracle | Columbia, MD, United States
Software Engineer for Manta Systems
@ PXGEO | Linköping, Östergötland County, Sweden
DevOps Engineer
@ Teradyne | Odense, DK
LIDAR System Engineer Trainee
@ Valeo | PRAGUE - PRA2
Business Applications Administrator
@ Allegro | Poznań, Poland