all AI news
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
April 26, 2024, 4:47 a.m. | Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang
cs.CL updates on arXiv.org arxiv.org
Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into …
abstract arxiv cost cs.cl eess.as energy energy efficient experts frontiers impact inference language language processing moe natural natural language natural language processing parameters path processing scale scaling training type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Alternance DATA/AI Engineer (H/F)
@ SQLI | Le Grand-Quevilly, France