Feb. 26, 2024, 6:11 p.m. | /u/victordion

Machine Learning www.reddit.com

Seeing a lot of literature mentioning using KV cache for transformer models to reduce compute in decoder, but in my understanding, when the sequence reaches maximum context length and each left shift renders the left-most token out of scope, the KV cache would lose validity, apparently because a previously participating token vanishes, is that correct?

cache compute context decoder literature llm machinelearning reduce shift token transformer transformer models understanding

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US