June 14, 2024, 4:42 a.m. | Guanghui Qin, Corby Rosset, Ethan C. Chau, Nikhil Rao, Benjamin Van Durme

cs.CL updates on arXiv.org arxiv.org

arXiv:2310.02409v2 Announce Type: replace
Abstract: Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adapted to Dodo by efficient parameter tuning methods such as LoRA. In use, …

abstract arxiv attention compression context cost cs.ai cs.cl cs.lg decoder dynamic hidden language language models layer lms per replace self-attention solution standard text token transformer transformer model type vector

Senior Data Engineer

@ Displate | Warsaw

Senior Principal Software Engineer

@ Oracle | Columbia, MD, United States

Software Engineer for Manta Systems

@ PXGEO | Linköping, Östergötland County, Sweden

DevOps Engineer

@ Teradyne | Odense, DK

LIDAR System Engineer Trainee

@ Valeo | PRAGUE - PRA2

Business Applications Administrator

@ Allegro | Poznań, Poland