Feb. 13, 2024, 5:45 a.m. | Victor Lecomte Kushal Thaman Trevor Chow Rylan Schaeffer Sanmi Koyejo

cs.LG updates on arXiv.org arxiv.org

Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with implications for AI safety. The classic origin story of polysemanticity is that the data contains more ``features" than neurons, such that learning to perform a task forces the network to co-allocate multiple unrelated features to the same neuron, endangering our ability to understand networks' internal processing. In this work, we present a second …

cs.ai cs.lg cs.ne data features interpretability mixed networks neurons safety set story

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analytics Engineer (Retail)

@ Lightspeed Commerce | Toronto, Ontario, Canada

Data Scientist II, BIA GPS India Operations

@ Bristol Myers Squibb | Hyderabad

Analytics Engineer

@ Bestpass | Remote

Senior Analyst - Data Management

@ Marsh McLennan | Mumbai - Hiranandani