all AI news
Inductive Biases and Variable Creation in Self-Attention Mechanisms. (arXiv:2110.10090v2 [cs.LG] UPDATED)
cs.LG updates on arXiv.org arxiv.org
Self-attention, an architectural motif designed to model long-range
interactions in sequential data, has driven numerous recent breakthroughs in
natural language processing and beyond. This work provides a theoretical
analysis of the inductive biases of self-attention modules. Our focus is to
rigorously establish which functions and long-range dependencies self-attention
blocks prefer to represent. Our main result shows that bounded-norm Transformer
networks "create sparse variables": a single self-attention head can represent
a sparse function of the input sequence, with sample complexity scaling …
arxiv attention attention mechanisms biases inductive lg self-attention