all AI news
Mixtral of Experts (Paper Explained)
Jan. 13, 2024, 4:12 p.m. | Yannic Kilcher
Yannic Kilcher www.youtube.com
OUTLINE:
0:00 - Introduction
3:00 - Mixture of Experts
6:00 - Classic Transformer Blocks
11:15 - Expert Routing
17:00 - Sparse Expert Routing
22:00 - Expert Parallelism
25:00 - Experimental Results
31:30 - Routing Analysis
33:20 - Conclusion
Paper: https://arxiv.org/abs/2401.04088
Abstract:
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every …
abstract analysis chatgpt experimental expert experts explained introduction language language model mistral mixtral mixtral 8x7b mixtral of experts mixture of experts paper routing transformer
More from www.youtube.com / Yannic Kilcher
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Codec Avatars Research Engineer
@ Meta | Pittsburgh, PA