Simplifying Transformer Blocks | allainews.com

June 3, 2024, 4:44 a.m. | Bobby He, Thomas Hofmann

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.01906v2 Announce Type: replace
Abstract: A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable.
In this work, we ask to what extent the standard transformer block can be simplified? Combining signal propagation theory and empirical …

abstract architectures arxiv attention building complexity compose cs.lg design leads mlp recipe reduce render replace simple simplifying speed standard training transformer transformers type

More from arxiv.org / cs.LG updates on arXiv.org

Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling 34 minutes ago | arxiv.org

abstract arxiv bayesian cs.lg +17

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning 34 minutes ago | arxiv.org

abstract analysis arxiv cs.cv +16

Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries 34 minutes ago | arxiv.org

arxiv cs.cv cs.lg detection +3

MixerFlow: MLP-Mixer meets Normalising Flows 34 minutes ago | arxiv.org

abstract architectures arxiv context +15

Machine Learning-Enabled Software and System Architecture Frameworks 34 minutes ago | arxiv.org

abstract architecture arxiv concerns +22

Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops 34 minutes ago | arxiv.org

abstract analysis arxiv cs.lg +19

Kernelised Normalising Flows 34 minutes ago | arxiv.org

abstract architecture arxiv capabilities +14

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism 34 minutes ago | arxiv.org

abstract arxiv class cs.dc +25

Reinforcement Learning in Credit Scoring and Underwriting 34 minutes ago | arxiv.org

abstract action adapt arxiv +17

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Solutions Architect

@ PwC | Bucharest - 1A Poligrafiei Boulevard

View on ai-jobs.net

Research Fellow (Social and Cognition Factors, CLIC)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

Research Aide - Research Aide I - Department of Psychology

@ Cornell University | Ithaca (Main Campus)

View on ai-jobs.net

Technical Architect - SMB/Desk

@ Salesforce | Ireland - Dublin

View on ai-jobs.net