Jan. 13, 2024, 4:12 p.m. | Yannic Kilcher

Yannic Kilcher www.youtube.com

#mixtral #mistral #chatgpt

OUTLINE:
0:00 - Introduction
3:00 - Mixture of Experts
6:00 - Classic Transformer Blocks
11:15 - Expert Routing
17:00 - Sparse Expert Routing
22:00 - Expert Parallelism
25:00 - Experimental Results
31:30 - Routing Analysis
33:20 - Conclusion

Paper: https://arxiv.org/abs/2401.04088

Abstract:
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every …

abstract analysis chatgpt experimental expert experts explained introduction language language model mistral mixtral mixtral 8x7b mixtral of experts mixture of experts paper routing transformer

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US