[R] Simplifying Transformer Blocks | allainews.com

Nov. 13, 2023, 7:27 p.m. | /u/APaperADay

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2311.01906](https://arxiv.org/abs/2311.01906)

**GitHub**: [https://github.com/bobby-he/simplified\_transformers](https://github.com/bobby-he/simplified_transformers)

**Abstract**:

>A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable.
In this work, we ask to what extent the standard transformer block can be simplified? Combining signal propagation theory and empirical …

abstract architectures attention building complexity design leads machinelearning mlp recipe reduce simple speed standard training transformer transformers work

More from www.reddit.com / Machine Learning

[N] AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits … 5 hours ago | www.reddit.com

ai tools article artificial artificial intelligence +17

[R] HGRN2: Gated Linear RNNs with State Expansion 12 hours ago | www.reddit.com

abstract attention expansion however +15

[R] A Primer on the Inner Workings of Transformer-based Language Models 12 hours ago | www.reddit.com

abstract advanced authors insights +9

[D] Fine-tune Phi-3 model for domain specific data - seeking advice and insights 14 hours ago | www.reddit.com

accuracy advice benchmark data +11

[R] Iterative Reasoning Preference Optimization 18 hours ago | www.reddit.com

iterative machinelearning optimization reasoning

[D] Good strategies / resources to improve MLOps skills as a PhD student / researcher 1 day ago | www.reddit.com

eventually good index industry +12

[Discussion] Should I go to ICML and present my paper? 1 day ago | www.reddit.com

academia data data scientist future +10

[P] Panza: A personal email assistant, trained and running on-device 1 day ago | www.reddit.com

assistant automated email emails +9

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? 1 day, 2 hours ago | www.reddit.com

70b a100 budget five +9

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

View on ai-jobs.net

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120

View on ai-jobs.net