all AI news
Approximation Rate of the Transformer Architecture for Sequence Modeling
Feb. 20, 2024, 5:44 a.m. | Haotian Jiang, Qianxiao Li
cs.LG updates on arXiv.org arxiv.org
Abstract: The Transformer architecture is widely applied in sequence modeling applications, yet the theoretical understanding of its working principles remains limited. In this work, we investigate the approximation rate for single-layer Transformers with one head. We consider a class of non-linear relationships and identify a novel notion of complexity measures to establish an explicit Jackson-type approximation rate estimate for the Transformer. This rate reveals the structural properties of the Transformer and suggests the types of sequential …
abstract applications approximation architecture arxiv class cs.lg head identify layer linear modeling non-linear notion novel rate relationships transformer transformer architecture transformers type understanding work
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South