Toward Inference-optimal Mixture-of-Expert Large Language Models | allainews.com

April 4, 2024, 4:42 a.m. | Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.02852v1 Announce Type: new
Abstract: Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens? We study the scaling law of MoE-based LLMs regarding the relations between the model …

abstract arxiv budget cost cs.lg deepseek expert growth inference language language models large language large language models llms mixtral moe question scaling training transformers type

More from arxiv.org / cs.LG updates on arXiv.org

Tao: Re-Thinking DL-based Microarchitecture Simulation 14 hours ago | arxiv.org

abstract arxiv cs.ar cs.lg +12

Towards a Systems Theory of Algorithms 14 hours ago | arxiv.org

abstract algorithms arxiv code +16

Object Detection for Automated Coronary Artery Using Deep Learning 14 hours ago | arxiv.org

abstract arxiv automated cs.cv +21

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer 14 hours ago | arxiv.org

abstract agents arxiv cs.lg +16

Computer Vision for Increased Operative Efficiency via Identification of Instruments in the Neurosurgical Operating Room: … 14 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +18

A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization 14 hours ago | arxiv.org

abstract applications arxiv case +16

nach0: Multimodal Natural and Chemical Languages Foundation Model 14 hours ago | arxiv.org

abstract arxiv biomedical creative +24

How good are Large Language Models on African Languages? 14 hours ago | arxiv.org

abstract arxiv context cs.ai +19

Using Skew to Assess the Quality of GAN-generated Image Features 14 hours ago | arxiv.org

abstract advancement adversarial arxiv +20

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Associate Data Engineer

@ Nominet | Oxford/ Hybrid, GB

View on ai-jobs.net

Data Science Senior Associate

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India

View on ai-jobs.net