all AI news
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Feb. 21, 2024, 5:43 a.m. | Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf
cs.LG updates on arXiv.org arxiv.org
Abstract: Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the models' vocabularies, helping to uncover how information flows within LMs. In this work, we extend this methodology to LMs' backward pass and gradients. We first prove that a gradient matrix can be cast as a low-rank linear combination of its forward …
abstract arxiv community cs.ai cs.cl cs.lg deep learning hidden information interpretability key language language model language models learn lms project recall space transformer type understanding
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Software Engineer, Generative AI (C++)
@ SoundHound Inc. | Toronto, Canada