Feb. 26, 2024, 6:11 p.m. | /u/victordion

Machine Learning www.reddit.com

Seeing a lot of literature mentioning using KV cache for transformer models to reduce compute in decoder, but in my understanding, when the sequence reaches maximum context length and each left shift renders the left-most token out of scope, the KV cache would lose validity, apparently because a previously participating token vanishes, is that correct?

cache compute context decoder literature llm machinelearning reduce shift token transformer transformer models understanding

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Aumni - Site Reliability Engineer III - MLOPS

@ JPMorgan Chase & Co. | Salt Lake City, UT, United States

Senior Data Analyst

@ Teya | Budapest, Hungary

Technical Analyst (Data Analytics)

@ Contact Government Services | Chicago, IL

Engineer, AI/Machine Learning

@ Masimo | Irvine, CA, United States

Private Bank - Executive Director: Data Science and Client / Business Intelligence

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India