all AI news
[D] I don't understand how backprop works on sparsely gated MoE
March 17, 2024, 2:22 a.m. | /u/Primary-Try8050
Machine Learning www.reddit.com
In the context of LLM, say you have n experts and you chose the top k for each token.
During training, the gate network could be completely wrong and leave the correct expert out of the chosen k. However, since the correct expert was not used, it gives no chance for the gate to increase the weight of the correct expert.
In other terms, during backdrop, only part of the …
context expert experts gate however llm machinelearning moe network token training
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne