Breaking through the learning plateaus of in-context learning in Transformer | allainews.com

June 7, 2024, 4:44 a.m. | Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

cs.LG updates on arXiv.org arxiv.org

arXiv:2309.06054v3 Announce Type: replace
Abstract: In-context learning, i.e., learning from context examples, is an impressive ability of Transformer. Training Transformers to possess this in-context learning skill is computationally intensive due to the occurrence of learning plateaus, which are periods within the training process where there is minimal or no enhancement in the model's in-context learning capability. To study the mechanism behind the learning plateaus, we conceptually seperate a component within the model's internal representation that is exclusively affected by the …

abstract arxiv breaking context context learning cs.cl cs.cv cs.lg examples in-context learning process replace skill through training transformer transformers type

More from arxiv.org / cs.LG updates on arXiv.org

Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling 2 days, 10 hours ago | arxiv.org

abstract arxiv bayesian cs.lg +17

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning 2 days, 10 hours ago | arxiv.org

abstract analysis arxiv cs.cv +16

Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries 2 days, 10 hours ago | arxiv.org

arxiv cs.cv cs.lg detection +3

MixerFlow: MLP-Mixer meets Normalising Flows 2 days, 10 hours ago | arxiv.org

abstract architectures arxiv context +15

Machine Learning-Enabled Software and System Architecture Frameworks 2 days, 10 hours ago | arxiv.org

abstract architecture arxiv concerns +22

Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops 2 days, 10 hours ago | arxiv.org

abstract analysis arxiv cs.lg +19

Kernelised Normalising Flows 2 days, 10 hours ago | arxiv.org

abstract architecture arxiv capabilities +14

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism 2 days, 10 hours ago | arxiv.org

abstract arxiv class cs.dc +25

Reinforcement Learning in Credit Scoring and Underwriting 2 days, 10 hours ago | arxiv.org

abstract action adapt arxiv +17

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India

View on ai-jobs.net

Systems Software Engineer, Graphics

@ Parallelz | Vancouver, British Columbia, Canada - Remote

View on ai-jobs.net

Engineering Manager - Geo Engineering Team (F/H/X)

@ AVIV Group | Paris, France

View on ai-jobs.net

Data Analyst

@ Microsoft | San Antonio, Texas, United States

View on ai-jobs.net

Azure Data Engineer

@ TechVedika | Hyderabad, India

View on ai-jobs.net

Senior Data & AI Threat Detection Researcher (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel

View on ai-jobs.net