all AI news
[R] Efficient Streaming Language Models with Attention Sinks - Meta AI 2023 - StreamingLLM enables Llama-2, Falcon and Pythia to have an infinite context length without any fine-tuning! Allows streaming use of LLMs!
Oct. 2, 2023, 7:09 p.m. | /u/Singularian2501
Machine Learning www.reddit.com
Github: [https://github.com/mit-han-lab/streaming-llm](https://github.com/mit-han-lab/streaming-llm)
Abstract:
>Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails …
abstract applications attention caching challenges decoding dialogue interactions language language models large language large language models llms machinelearning major memory popular stage streaming tokens training value
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AIML - Sr Machine Learning Engineer, Data and ML Innovation
@ Apple | Seattle, WA, United States
Senior Data Engineer
@ Palta | Palta Cyprus, Palta Warsaw, Palta remote