May 27, 2024, 4:44 a.m. | Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.15712v1 Announce Type: cross
Abstract: In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention layers to update throughout training--a relevant notion of feature learning in these models. We then use tools from dynamical mean field theory (DMFT) to analyze various infinite limits (infinite key/query dimension, infinite heads, and infinite depth) which have …

abstract analyze arxiv attention cond-mat.dis-nn cs.lg dynamics feature head identify multi-head notion scaling set stat.ml training transformer transformer models type update work

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Staff Software Engineer (Data Platform)

@ Phaidra | Remote

Distributed Compute Engineer

@ Magic | San Francisco

Power Platform Developer/Consultant

@ Euromonitor | Bengaluru, Karnataka, India

Finance Project Senior Manager

@ QIMA | London, United Kingdom