[R] Efficient Streaming Language Models with Attention Sinks - Meta AI 2023 - StreamingLLM enables Llama-2, Falcon and Pythia to have an infinite context length without any fine-tuning! Allows streaming use of LLMs! | allainews.com

Oct. 2, 2023, 7:09 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2309.17453](https://arxiv.org/abs/2309.17453)

Github: [https://github.com/mit-han-lab/streaming-llm](https://github.com/mit-han-lab/streaming-llm)

Abstract:

>Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails …

abstract applications attention caching challenges decoding dialogue interactions language language models large language large language models llms machinelearning major memory popular stage streaming tokens training value

More from www.reddit.com / Machine Learning

[P] NLLB-200 Distill 350M for en-ko 3 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 10 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 15 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 16 hours ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 16 hours ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 17 hours ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 19 hours ago | www.reddit.com

compression educational encoding entropy +7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 20 hours ago | www.reddit.com

abstract advancement application challenges +15

[D] Does it make sense to talk about the probabilities of models? 1 day, 3 hours ago | www.reddit.com

compute data likelihood machinelearning +4

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

AIML - Sr Machine Learning Engineer, Data and ML Innovation

@ Apple | Seattle, WA, United States

View on ai-jobs.net

Senior Data Engineer

@ Palta | Palta Cyprus, Palta Warsaw, Palta remote

View on ai-jobs.net