April 4, 2024, 4:43 a.m. | Jack Merullo, Carsten Eickhoff, Ellie Pavlick

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.16130v3 Announce Type: replace-cross
Abstract: A primary criticism towards language models (LMs) is their inscrutability. This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks using regularities encoded in the hidden space of the model (e.g., Poland:Warsaw::China:Beijing). We investigate a range of language model sizes (from 124M parameters to 176B parameters) in an in-context learning setting, and find that for a variety of tasks (involving capital …

abstract arxiv beijing china complexity cs.cl cs.lg evidence exploit hidden language language models lms paper poland relational simple solve space style tasks type vector word2vec

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada