Sept. 15, 2023, 12:14 p.m. | /u/30299578815310

Machine Learning www.reddit.com

Relevant Paper: [2307.08621.pdf (arxiv.org)](https://arxiv.org/pdf/2307.08621.pdf)

So the definition of the recurrent representation of the retention mechanism is below

>Sn = γSn−1 + K⊺nVn
>
>Retention(Xn) = QnSn, n = 1, · · · , |x|

γ is a decay factor, and K, Q, and V have their standard transformer definitions.

What confuses me is the derivation of Sn. The formula makes it look like a scalar. But if that's the case, are we saying that for a given token, the retention …

definition definitions derivation look machinelearning representation retention standard transformer

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Sr. VBI Developer II

@ Atos | Texas, US, 75093

Wealth Management - Data Analytics Intern/Co-op Fall 2024

@ Scotiabank | Toronto, ON, CA