[D] How is it that the latency to decode 1 new token with an LLM is constant independent of total sequence length, when caching KV? | allainews.com

Nov. 24, 2023, 6:06 p.m. | /u/CatfishJones96

Machine Learning www.reddit.com

I’m struggling to understand something about [transformer inference arithmetic](https://kipp.ly/transformer-inference-arithmetic/) with KV caching, together with some benchmarking results.

**How is it that the latency to decode 1 new token is constant independent of total sequence length (input+output)?**

Let’s assume batch size 1, and simple multi head attention. At each step t, even though we save recomputing KV for the entire sequence, we do have to compute attention using the current input’s Q against a growing kv cache, which represents more FLOPS …

attention cache compute current decode head independent latency machinelearning save simple token total

More from www.reddit.com / Machine Learning

[R] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments 7 hours ago | www.reddit.com

agent ai-powered ai-powered tool automated +18

[D] Evaluating LLMs Long-Context performance: What are the best practices? 12 hours ago | www.reddit.com

benchmarks best practices context frameworks +8

[R] Measuring Vision-Language STEM Skills of Neural Models 13 hours ago | www.reddit.com

abstract authors challenge engineering +16

[R] NExT: Teaching Large Language Models to Reason about Code Execution 16 hours ago | www.reddit.com

abstract code debug debugging +20

How much coursework is required to land an entry-level ML job? [D] 18 hours ago | www.reddit.com

berkeley building epidemiology job +4

[D] Foundational papers for Graph Adversarial Learning? 20 hours ago | www.reddit.com

machinelearning papers understanding

[D] Suggestions for NLP Papers Commonly Implemented in ML Interviews 1 day, 7 hours ago | www.reddit.com

companies implementation interview interviews +10

[D] How can attention mechanisms retrieve meaningful information over long distances when using RoPE or … 1 day, 10 hours ago | www.reddit.com

attention attention mechanisms information machinelearning +3

[D] Do Lead's in an AI/DS/ML team always have PhDs, is it a requirement? 1 day, 10 hours ago | www.reddit.com

hello lecture machinelearning masters +3

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Tableau/PowerBI Developer (A.Con)

@ KPMG India | Bengaluru, Karnataka, India

View on ai-jobs.net

Software Engineer, Backend - Data Platform (Big Data Infra)

@ Benchling | San Francisco, CA

View on ai-jobs.net