all AI news
[R] Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Nov. 21, 2023, 4:29 a.m. | /u/APaperADay
Machine Learning www.reddit.com
**Code**: [https://github.com/vulus98/Rethinking-attention](https://github.com/vulus98/Rethinking-attention)
**Abstract**:
>This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of …
abstract analysis architecture art attention behavior components distillation knowledge machinelearning networks simple standard state tasks transformer transformer model work
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Machine Learning Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, Ca
Senior Data Engineer (Microsoft Azure)
@ Capco | UK - London
Senior Data Analyst
@ Publicis Groupe | Bengaluru, India
Senior Data Engineer
@ Press Ganey | Chicago, IL, United States
Senior Data Scientist (remote from EU)
@ PriceHubble | Vienna, Vienna, Austria - Remote
Data Science Co-op
@ Novelis | Atlanta, GA, United States