March 14, 2024, 4:48 a.m. | Jia-Nan Li, Quan Tu, Cunli Mao, Zhengtao Yu, Ji-Rong Wen, Rui Yan

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.08312v1 Announce Type: new
Abstract: Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks). Accordingly, we introduce StreamingDialogue, which compresses long dialogue history into conv-attn sinks with minimal losses, and thus reduces computational complexity quadratically …

abstract arxiv compression context cs.ai cs.cl dialogue efficiency information language language models large language large language models llms losses observation standard struggle token type via

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru