all AI news
[R] Simplifying Transformer Blocks
Nov. 13, 2023, 7:27 p.m. | /u/APaperADay
Machine Learning www.reddit.com
**GitHub**: [https://github.com/bobby-he/simplified\_transformers](https://github.com/bobby-he/simplified_transformers)
**Abstract**:
>A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable.
In this work, we ask to what extent the standard transformer block can be simplified? Combining signal propagation theory and empirical …
abstract architectures attention building complexity design leads machinelearning mlp recipe reduce simple speed standard training transformer transformers work
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120