Feb. 21, 2024, 5:43 a.m. | Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.12865v1 Announce Type: cross
Abstract: Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the models' vocabularies, helping to uncover how information flows within LMs. In this work, we extend this methodology to LMs' backward pass and gradients. We first prove that a gradient matrix can be cast as a low-rank linear combination of its forward …

abstract arxiv community cs.ai cs.cl cs.lg deep learning hidden information interpretability key language language model language models learn lms project recall space transformer type understanding

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada