Turn Waste into Worth: Rectifying Top-$k$ Router of MoE | allainews.com

Feb. 21, 2024, 5:41 a.m. | Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.12399v1 Announce Type: new
Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively impacting model performance. To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU …

abstract arxiv computation computational costs cs.ai cs.lg efficiency experts language language models large language large language models memory mixture of experts moe overflow popular redundancy routing tokens training type waste

More from arxiv.org / cs.LG updates on arXiv.org

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference 10 hours ago | arxiv.org

abstract arxiv cs.cl cs.lg +15

Brain-Inspired Spiking Neural Networks for Industrial Fault Diagnosis: A Survey, Challenges, and Opportunities 10 hours ago | arxiv.org

abstract arxiv brain brain-inspired +21

Data-driven Energy Efficiency Modelling in Large-scale Networks: An Expert Knowledge and ML-based Approach 10 hours ago | arxiv.org

abstract arxiv challenge complexity +23

Learned Regularization for Inverse Problems: Insights from a Spectral Model 10 hours ago | arxiv.org

abstract art arxiv convergence +14

LLMs cannot find reasoning errors, but can correct them given the error location 10 hours ago | arxiv.org

abstract arxiv become chen +17

Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications 10 hours ago | arxiv.org

abstract arxiv channels communications +17

Deep ReLU networks and high-order finite element methods II: Chebyshev emulation 10 hours ago | arxiv.org

abstract arxiv continuous cs.lg +17

Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development 10 hours ago | arxiv.org

abstract app application arxiv +21

On Universally Optimal Algorithms for A/B Testing 10 hours ago | arxiv.org

abstract a/b testing algorithm algorithms +17

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Real World Evidence Research Analyst

@ Novartis | Dublin (Novartis Global Service Center (NGSC))

View on ai-jobs.net

Senior DataOps Engineer

@ Winterthur Gas & Diesel AG | Winterthur, CH

View on ai-jobs.net