s
Jan. 9, 2024, 4:03 a.m. |

Simon Willison's Weblog simonwillison.net

Mixtral of Experts


The Mixtral paper is out, exactly a month after the release of the Mixtral 8x7B model itself. Thanks to the paper I now have a reasonable understanding of how a mixture of experts model works: each layer has 8 available blocks, but a router model selects two out of those eight for each token passing through that layer and combines their output. "As a result, each token has access to 47B parameters, but only uses 13B active …

ai experts generativeai layer llms mistral mixtral mixtral 8x7b mixture of experts paper release token understanding

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US