all AI news
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
March 12, 2024, 4:44 a.m. | Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt
cs.LG updates on arXiv.org arxiv.org
Abstract: Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: "overthinking" and "false induction heads". The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. …
abstract arxiv context cs.ai cs.cl cs.lg enabling false few-shot few-shot learning fine-tuning however language language models modern patterns process study tasks them through truth type understanding
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
#13721 - Data Engineer - AI Model Testing
@ Qualitest | Miami, Florida, United States
Elasticsearch Administrator
@ ManTech | 201BF - Customer Site, Chantilly, VA