all AI news
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. (arXiv:2112.06905v2 [cs.CL] UPDATED)
Aug. 3, 2022, 1:11 a.m. | Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Lia
cs.CL updates on arXiv.org arxiv.org
Scaling language models with more data, compute and parameters has driven
significant progress in natural language processing. For example, thanks to
scaling, GPT-3 was able to achieve strong results on in-context learning tasks.
However, training these large dense models requires significant amounts of
computing resources. In this paper, we propose and develop a family of language
models named GLaM (Generalist Language Model), which uses a sparsely activated
mixture-of-experts architecture to scale the model capacity while also
incurring substantially less training …
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 8 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 8 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO
@ Eurofins | Pueblo, CO, United States
Camera Perception Engineer
@ Meta | Sunnyvale, CA