QAQ: Quality Adaptive Quantization for LLM KV Cache | allainews.com

March 8, 2024, 5:47 a.m. | Shichen Dong, Wen Cheng, Jiayu Qin, Wei Wang

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.04643v1 Announce Type: new
Abstract: The emergence of LLMs has ignited a fresh surge of breakthroughs in NLP applications, particularly in domains such as question-answering systems and text generation. As the need for longer context grows, a significant bottleneck in model deployment emerges due to the linear expansion of the Key-Value (KV) cache with the context length. Existing methods primarily rely on various hypotheses, such as sorting the KV cache based on attention scores for replacement or eviction, to compress …

abstract applications arxiv cache context cs.cl deployment domains emergence expansion key linear llm llms model deployment nlp quality quantization question systems text text generation the key type value

More from arxiv.org / cs.CL updates on arXiv.org

Sparse is Enough in Fine-tuning Pre-trained Large Language Models 16 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

On the Learnability of Watermarks for Language Models 16 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +17

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization 16 hours ago | arxiv.org

abstract arxiv capabilities cs.ai +14

Evaluating Generative Ad Hoc Information Retrieval 16 hours ago | arxiv.org

abstract advances arxiv cs.cl +19

Language Models As Semantic Indexers 16 hours ago | arxiv.org

arxiv cs.cl cs.ir cs.lg +4

Large language models can accurately predict searcher preferences 16 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

On the Reliability of Watermarks for Large Language Models 16 hours ago | arxiv.org

abstract arxiv become bots +28

A Watermark for Large Language Models 16 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +16

CreoleVal: Multilingual Multitask Benchmarks for Creoles 16 hours ago | arxiv.org

abstract annotated data arxiv benchmarks +14

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

View on ai-jobs.net

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120

View on ai-jobs.net