March 18, 2024, 4:42 a.m. | Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

cs.LG updates on arXiv.org arxiv.org

arXiv:2310.08391v2 Announce Type: replace-cross
Abstract: Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of …

abstract adjusting arxiv capabilities context cs.lg diverse enabling in-context learning layer linear linear regression paper parameters pretraining regression solve stat.ml study tasks them transformers type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Business Intelligence Architect - Specialist

@ Eastman | Hyderabad, IN, 500 008