[D] I don't understand how backprop works on sparsely gated MoE | allainews.com

March 17, 2024, 2:22 a.m. | /u/Primary-Try8050

Machine Learning www.reddit.com

I don't understand how backprop works on sparsely gated MoE

In the context of LLM, say you have n experts and you chose the top k for each token.

During training, the gate network could be completely wrong and leave the correct expert out of the chosen k. However, since the correct expert was not used, it gives no chance for the gate to increase the weight of the correct expert.

In other terms, during backdrop, only part of the …

context expert experts gate however llm machinelearning moe network token training

More from www.reddit.com / Machine Learning

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 7 hours ago | www.reddit.com

apps devices edge embed +15

[D] Any-dimensional equivariant neural networks 8 hours ago | www.reddit.com

abstract assumptions authors cases +18

How are large network attack datasets made? [p] 13 hours ago | www.reddit.com

attacks datasets detection free +5

A Multi-Agent game where LLMs must trick each other as humans until one gets caught … 15 hours ago | www.reddit.com

agent fun game humans +7

[D] How reliable is RAG currently? 16 hours ago | www.reddit.com

context context window documents machinelearning +5

[N] New Challenges in DIAMBRA Arena: 3 epic additions to our lineup of RL environments! 16 hours ago | www.reddit.com

arena challenges environments epic +1

[R] An Analysis of Linear Time Series Forecasting Models 18 hours ago | www.reddit.com

abstract analysis forecasting form +9

[D] The "it" in AI models is really just the dataset? 19 hours ago | www.reddit.com

ai models dataset machinelearning

[D] Analysis of Time To First Token (TTFT) of LLMs (10B-34B) 21 hours ago | www.reddit.com

analysis containers docker hey +10

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net