Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization. (arXiv:2208.00579v1 [cs.LG]) | allainews.com

Aug. 2, 2022, 2:10 a.m. | Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang

cs.LG updates on arXiv.org arxiv.org

Transformers have achieved remarkable success in sequence modeling and beyond
but suffer from quadratic computational and memory complexities with respect to
the length of the input sequence. Leveraging techniques include sparse and
linear attention and hashing tricks; efficient transformers have been proposed
to reduce the quadratic complexity of transformers but significantly degrade
the accuracy. In response, we first interpret the linear attention and residual
connections in computing the attention map as gradient descent steps. We then
introduce momentum into these …

arxiv attention gap lg linearization performance self-attention transformer

More from arxiv.org / cs.LG updates on arXiv.org

Stochastic Optimal Control Matching 16 hours ago | arxiv.org

arxiv control cs.lg cs.na +6

Value Approximation for Two-Player General-Sum Differential Games with State Constraints 16 hours ago | arxiv.org

abstract approximation arxiv constraints +20

Can We Edit Multimodal Large Language Models? 16 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.cv +9

XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation 16 hours ago | arxiv.org

ai benchmark arxiv benchmark cs.cv +7

Generalized Schr\"odinger Bridge Matching 16 hours ago | arxiv.org

arxiv bridge cs.lg generalized +3

Tight bounds on Pauli channel learning without entanglement 16 hours ago | arxiv.org

abstract algorithms arxiv cs.it +9

Automated mapping of virtual environments with visual predictive coding 16 hours ago | arxiv.org

abstract access algorithms arxiv +28

Confident Feature Ranking 16 hours ago | arxiv.org

abstract arxiv cs.ai cs.lg +14

Integrated Sensing-Communication-Computation for Edge Artificial Intelligence 16 hours ago | arxiv.org

abstract advanced and edge ai artificial +27

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Applied Scientist, Control Stack, AWS Center for Quantum Computing

@ Amazon.com | Pasadena, California, USA

View on ai-jobs.net

Specialist Marketing with focus on ADAS/AD f/m/d

@ AVL | Graz, AT

View on ai-jobs.net

Machine Learning Engineer, PhD Intern

@ Instacart | United States - Remote

View on ai-jobs.net

Supervisor, Breast Imaging, Prostate Center, Ultrasound

@ University Health Network | Toronto, ON, Canada

View on ai-jobs.net

Senior Manager of Data Science (Recommendation Science)

@ NBCUniversal | New York, NEW YORK, United States

View on ai-jobs.net