March 5, 2024, 2:44 p.m. | Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis

cs.LG updates on arXiv.org arxiv.org

arXiv:2306.02010v3 Announce Type: replace
Abstract: Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive. This paper investigates the memorization abilities of multi-head attention mechanisms, examining how many example sequences they can memorize, as a function of the number of heads and sequence length. Motivated by experimental findings on vision transformers, we introduce novel assumptions about the linear independence of input data, distinct from the commonly used general-position assumption. Under …

abstract architecture arxiv attention attention mechanisms become capacity cs.lg example function head language multi-head multi-head attention paper tasks transformers type vision

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada