Feb. 8, 2024, 5:46 a.m. | Zhengxuan Wu Atticus Geiger Thomas Icard Christopher Potts Noah D. Goodman

cs.CL updates on arXiv.org arxiv.org

Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) is a powerful gradient descent method grounded in a theory of causal abstraction that has uncovered perfect alignments between interpretable symbolic algorithms and small deep learning models fine-tuned for specific tasks. In the present …

cs.cl

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

ETL Developer

@ Gainwell Technologies | Bengaluru, KA, IN, 560100

Medical Radiation Technologist, Breast Imaging

@ University Health Network | Toronto, ON, Canada

Data Scientist

@ PayPal | USA - Texas - Austin - Corp - Alterra Pkwy