all AI news
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
March 15, 2024, 4:42 a.m. | Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen
cs.LG updates on arXiv.org arxiv.org
Abstract: Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse. Therefore, vanilla SMoE models are memory inefficient and non-scalable, especially for resource-constrained downstream scenarios. In this paper, we ask: Can we craft a compact …
abstract arxiv capacity cs.ai cs.cl cs.lg experts however memory merge multiple network networks neural networks policy redundancy routing scale type usage
More from arxiv.org / cs.LG updates on arXiv.org
The Perception-Robustness Tradeoff in Deterministic Image Restoration
2 days, 17 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne