April 6, 2024, 9:35 p.m. | /u/CriticalTemperature1

Machine Learning www.reddit.com

I've been looking into MoE models to understand why they work so well, and was wondering if there are ways to create effectively "infinite" experts that can route to any number of parameters required for the task? As an example, Mixtral MoE uses 8 experts but the architecture can scale to more: [https://arxiv.org/abs/2401.04088](https://arxiv.org/abs/2401.04088)

The idea is to have some logic that can choose which weights to multiply ahead of time, but I have a feeling that the computation required to …

computation computing logic machinelearning

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain