March 1, 2024, 5:42 a.m. | Vivien Cabannes, Berfin Simsek, Alberto Bietti

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.18724v1 Announce Type: new
Abstract: This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic growth of the ``classification margins.'' Yet, we show that imbalance in token frequencies and memory interferences due to …

abstract arxiv correlations cs.ai cs.lg data distribution dynamics embeddings gradient memories memory products reduce stat.ml study theory through token training type work

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US