all AI news
Toward Inference-optimal Mixture-of-Expert Large Language Models
April 4, 2024, 4:42 a.m. | Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang
cs.LG updates on arXiv.org arxiv.org
Abstract: Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens? We study the scaling law of MoE-based LLMs regarding the relations between the model …
abstract arxiv budget cost cs.lg deepseek expert growth inference language language models large language large language models llms mixtral moe question scaling training transformers type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Associate Data Engineer
@ Nominet | Oxford/ Hybrid, GB
Data Science Senior Associate
@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India