all AI news
Memorization Capacity of Multi-Head Attention in Transformers
March 5, 2024, 2:44 p.m. | Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis
cs.LG updates on arXiv.org arxiv.org
Abstract: Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive. This paper investigates the memorization abilities of multi-head attention mechanisms, examining how many example sequences they can memorize, as a function of the number of heads and sequence length. Motivated by experimental findings on vision transformers, we introduce novel assumptions about the linear independence of input data, distinct from the commonly used general-position assumption. Under …
abstract architecture arxiv attention attention mechanisms become capacity cs.lg example function head language multi-head multi-head attention paper tasks transformers type vision
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Software Engineer, Generative AI (C++)
@ SoundHound Inc. | Toronto, Canada