all AI news
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
May 20, 2024, 4:42 a.m. | Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel H\"anni, Cindy Wu, Marius Hobbhahn
cs.LG updates on arXiv.org arxiv.org
Abstract: Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to …
abstract algorithms arxiv computation cs.lg engineer engineering inside interpretability landscape loss network networks neural networks parameters studying type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Senior Applied Data Scientist
@ dunnhumby | London
Principal Data Architect - Azure & Big Data
@ MGM Resorts International | Home Office - US, NV