all AI news
Infinite Limits of Multi-head Transformer Dynamics
May 27, 2024, 4:44 a.m. | Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan
cs.LG updates on arXiv.org arxiv.org
Abstract: In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention layers to update throughout training--a relevant notion of feature learning in these models. We then use tools from dynamical mean field theory (DMFT) to analyze various infinite limits (infinite key/query dimension, infinite heads, and infinite depth) which have …
abstract analyze arxiv attention cond-mat.dis-nn cs.lg dynamics feature head identify multi-head notion scaling set stat.ml training transformer transformer models type update work
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Lead Python Developer - Generative AI
@ S&P Global | US - TX - VIRTUAL
Analytics Engineer - Design Experience
@ Canva | Sydney, Australia
Data Architect
@ Unisys | Bengaluru - RGA Tech Park
Data Architect
@ HP | PSR01 - Bengaluru, Pritech Park- SEZ (PSR01)
Streetlight Analyst
@ DTE Energy | Belleville, MI, US