Infinite Limits of Multi-head Transformer Dynamics | allainews.com

May 27, 2024, 4:44 a.m. | Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.15712v1 Announce Type: cross
Abstract: In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention layers to update throughout training--a relevant notion of feature learning in these models. We then use tools from dynamical mean field theory (DMFT) to analyze various infinite limits (infinite key/query dimension, infinite heads, and infinite depth) which have …

abstract analyze arxiv attention cond-mat.dis-nn cs.lg dynamics feature head identify multi-head notion scaling set stat.ml training transformer transformer models type update work

More from arxiv.org / cs.LG updates on arXiv.org

Revisiting Active Learning in the Era of Vision Foundation Models 9 hours ago | arxiv.org

active learning arxiv cs.cv cs.lg +4

Fast gradient-free activation maximization for neurons in spiking neural networks 9 hours ago | arxiv.org

abstract artificial arxiv cognitive +16

Diverse Part Synthesis for 3D Shape Creation 9 hours ago | arxiv.org

abstract applications arxiv cs.cv +15

SoK: Facial Deepfake Detectors 9 hours ago | arxiv.org

abstract arxiv cs.cr cs.cv +19

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies 9 hours ago | arxiv.org

arxiv cs.cv cs.gr cs.lg +7

Accelerating Electronic Stopping Power Predictions by 10 Million Times with a Combination of Time-Dependent Density … 9 hours ago | arxiv.org

abstract arxiv combination cond-mat.mtrl-sci +24

Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models 9 hours ago | arxiv.org

abstract ai foundation ai foundation models applications +21

Analysis of learning a flow-based generative model from limited sample complexity 9 hours ago | arxiv.org

abstract analysis arxiv autoencoder +13

PiPar: Pipeline Parallelism for Collaborative Machine Learning 9 hours ago | arxiv.org

abstract arxiv collaborative cs.dc +19

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Staff Software Engineer (Data Platform)

@ Phaidra | Remote

View on ai-jobs.net

Distributed Compute Engineer

@ Magic | San Francisco

View on ai-jobs.net

Power Platform Developer/Consultant

@ Euromonitor | Bengaluru, Karnataka, India

View on ai-jobs.net

Finance Project Senior Manager

@ QIMA | London, United Kingdom

View on ai-jobs.net