Feb. 21, 2024, 5:41 a.m. | Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.12399v1 Announce Type: new
Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively impacting model performance. To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU …

abstract arxiv computation computational costs cs.ai cs.lg efficiency experts language language models large language large language models memory mixture of experts moe overflow popular redundancy routing tokens training type waste

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Real World Evidence Research Analyst

@ Novartis | Dublin (Novartis Global Service Center (NGSC))

Senior DataOps Engineer

@ Winterthur Gas & Diesel AG | Winterthur, CH