Feb. 6, 2024, 5:43 a.m. | Quang Pham Giang Do Huy Nguyen TrungTin Nguyen Chenghao Liu Mina Sartipi Binh T. Nguyen Savith

cs.LG updates on arXiv.org arxiv.org

Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under …

beyond competition complexity cs.lg experts issue mean mixture of experts network redundancy representation scale solution training via

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne