Web: http://arxiv.org/abs/2202.06258

June 17, 2022, 1:11 a.m. | Haixu Wu, Jialong Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long

cs.LG updates on arXiv.org arxiv.org

Transformers based on the attention mechanism have achieved impressive
success in various areas. However, the attention mechanism has a quadratic
complexity, significantly impeding Transformers from dealing with numerous
tokens and scaling up to bigger models. Previous methods mainly utilize the
similarity decomposition and the associativity of matrix multiplication to
devise linear-time attention mechanisms. They avoid degeneration of attention
to a trivial distribution by reintroducing inductive biases such as the
locality, thereby at the expense of model generality and expressiveness. In …

arxiv conservation lg transformers

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY