Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input | allainews.com

March 26, 2024, 4:49 a.m. | Shokichi Takakura, Taiji Suzuki

stat.ML updates on arXiv.org arxiv.org

arXiv:2305.18699v1 Announce Type: cross
Abstract: Despite the great success of Transformer networks in various applications such as natural language processing and computer vision, their theoretical aspects are not well understood. In this paper, we study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. Although inputs and outputs are both infinite dimensional, we show that when the target function has anisotropic smoothness, Transformers can avoid the curse of dimensionality due to their feature extraction ability …

abstract applications approximation arxiv computer computer vision cs.lg functions language language processing natural natural language natural language processing networks paper processing stat.ml study success transformer transformers type vision

More from arxiv.org / stat.ML updates on arXiv.org

Seeded graph matching for the correlated Gaussian Wigner model via the projected power method 4 hours ago | arxiv.org

abstract agreement arxiv edge +10

Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms 4 hours ago | arxiv.org

abstract algorithms analyze arxiv +11

Mixture of partially linear experts 4 hours ago | arxiv.org

abstract arxiv benefits computational +9

Adaptive deep density approximation for stochastic dynamical systems 4 hours ago | arxiv.org

abstract approximation arxiv cs.na +16

Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding 1 day, 4 hours ago | arxiv.org

abstract accuracy applications arxiv +21

Decolonial AI Alignment: Openness, Vi\'{s}e\d{s}a-Dharma, and Including Excluded Knowledges 1 day, 4 hours ago | arxiv.org

abstract ai alignment alignment artificial +23

Adaptive deep learning for nonlinear time series models 1 day, 4 hours ago | arxiv.org

abstract arxiv deep learning dnn +15

A Full Adagrad algorithm with O(Nd) operations 1 day, 4 hours ago | arxiv.org

abstract algorithm arxiv challenges +16

Minimax Regret Learning for Data with Heterogeneous Subgroups 1 day, 4 hours ago | arxiv.org

abstract applications arxiv data +15

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Consultant - Artificial Intelligence & Data (Google Cloud Data Engineer) - MY / TH

@ Deloitte | Kuala Lumpur, MY

View on ai-jobs.net