Feb. 20, 2024, 5:44 a.m. | Haotian Jiang, Qianxiao Li

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.18475v2 Announce Type: replace
Abstract: The Transformer architecture is widely applied in sequence modeling applications, yet the theoretical understanding of its working principles remains limited. In this work, we investigate the approximation rate for single-layer Transformers with one head. We consider a class of non-linear relationships and identify a novel notion of complexity measures to establish an explicit Jackson-type approximation rate estimate for the Transformer. This rate reveals the structural properties of the Transformer and suggests the types of sequential …

abstract applications approximation architecture arxiv class cs.lg head identify layer linear modeling non-linear notion novel rate relationships transformer transformer architecture transformers type understanding work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South