March 5, 2024, 2:42 p.m. | Mehran Hosseini, Peyman Hosseini

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.01643v1 Announce Type: new
Abstract: We introduce three new attention mechanisms that outperform standard multi-head attention in terms of efficiency and learning capabilities, thereby improving the performance and broader deployability of Transformer models. Our first contribution is Optimised Attention, which performs similarly to standard attention, but has 3/4 as many parameters and one matrix multiplication fewer per head. Next, we introduce Efficient Attention, which performs on par with standard attention with only 1/2 as many parameters as many parameters and …

abstract arxiv attention attention mechanisms capabilities cs.ai cs.cl cs.cv cs.lg efficiency head matrix multi-head multi-head attention parameters performance standard terms transformer transformer models type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA