all AI news
SADMoE: Exploiting Activation Sparsity with Dynamic-k Gating
Feb. 27, 2024, 5:43 a.m. | Filip Szatkowski, Bartosz W\'ojcik, Miko{\l}aj Pi\'orczy\'nski, Kamil Adamczewski
cs.LG updates on arXiv.org arxiv.org
Abstract: Transformer models, despite their impressive performance, often face practical limitations due to their high computational requirements. At the same time, such models exhibit significant activation sparsity, which can be leveraged to reduce the inference cost by transforming parts of the network into Mixture-of-Experts (MoE) layers. However, despite the crucial role of activation sparsity, its impact on this process remains unexplored. In this paper, we enhance the efficiency of MoE conversion through activation sparsity enforcement. Moreover, …
abstract arxiv computational cost cs.lg dynamic experts face inference limitations moe network performance practical reduce requirements sparsity transformer transformer models type
More from arxiv.org / cs.LG updates on arXiv.org
The Perception-Robustness Tradeoff in Deterministic Image Restoration
1 day, 15 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne