all AI news
Revisiting the Hypothesis: Do pretrained Transformers Learn In-Context by Gradient Descent?
March 1, 2024, 5:44 a.m. | Lingfeng Shen, Aayush Mishra, Daniel Khashabi
cs.LG updates on arXiv.org arxiv.org
Abstract: The emergence of In-Context Learning (ICL) in LLMs remains a significant phenomenon with little understanding. To explain ICL, recent studies try to theoretically connect it to Gradient Descent (GD). We ask, does this connection hold up in actual pre-trained models?
We highlight the limiting assumptions in prior works that make their context considerably different from the practical context in which language models are trained. For example, the theoretical hand-constructed weights used in these studies have …
abstract arxiv context cs.ai cs.cl cs.lg emergence gradient highlight hypothesis in-context learning learn llms pre-trained models studies transformers type understanding
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Machine Learning Engineer (AI, NLP, LLM, Generative AI)
@ Palo Alto Networks | Santa Clara, CA, United States
Consultant Senior Data Engineer F/H
@ Devoteam | Nantes, France