April 26, 2024, 4:47 a.m. | Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.16407v1 Announce Type: new
Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into …

abstract arxiv cost cs.cl eess.as energy energy efficient experts frontiers impact inference language language processing moe natural natural language natural language processing parameters path processing scale scaling training type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Alternance DATA/AI Engineer (H/F)

@ SQLI | Le Grand-Quevilly, France