March 12, 2024, 4:44 a.m. | Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

cs.LG updates on arXiv.org arxiv.org

arXiv:2307.09476v2 Announce Type: replace
Abstract: Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: "overthinking" and "false induction heads". The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. …

abstract arxiv context cs.ai cs.cl cs.lg enabling false few-shot few-shot learning fine-tuning however language language models modern patterns process study tasks them through truth type understanding

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA