[D] How does a MoE router learn when it has made a wrong choice? | allainews.com

April 6, 2024, 3:42 p.m. | /u/RepresentativeWay0

Machine Learning www.reddit.com

Looking at the code for current mixture of experts models, they seem to use argmax, with k=1 (picking only the top expert) to select the router choice. Since argmax is non differentiable, the gradient cannot flow to the other experts. Thus it seems to me that only the weights of the selected expert will be updated if it performs poorly. However, it could be the case that a different expert was in fact a better choice for the given input, …

code current differentiable expert experts flow gradient learn machinelearning mixture of experts moe

More from www.reddit.com / Machine Learning

[D] TensorDock — GPU Cloud Marketplace, H100s from $2.49/hr 6 hours ago | www.reddit.com

building cloud cloud gpu gpu +17

How does freezing a model work? [D] 9 hours ago | www.reddit.com

clip encoder guides inputs +9

[D] ICML 2024 Decision Thread 9 hours ago | www.reddit.com

create decision discuss every +9

Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land 14 hours ago | www.reddit.com

differentiable machinelearning

What cool thing are you using it for?[D] 22 hours ago | www.reddit.com

agriculture car detection driving +8

[R] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments 23 hours ago | www.reddit.com

agent ai-powered ai-powered tool automated +18

[D] Evaluating LLMs Long-Context performance: What are the best practices? 1 day, 5 hours ago | www.reddit.com

benchmarks best practices context frameworks +8

[R] Measuring Vision-Language STEM Skills of Neural Models 1 day, 5 hours ago | www.reddit.com

abstract authors challenge engineering +16

[R] NExT: Teaching Large Language Models to Reason about Code Execution 1 day, 9 hours ago | www.reddit.com

abstract code debug debugging +20

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Senior Data Science Analyst- ML/DL/LLM

@ Mayo Clinic | Jacksonville, FL, United States

View on ai-jobs.net

Machine Learning Research Scientist, Robustness and Uncertainty

@ Nuro, Inc. | Mountain View, California (HQ)

View on ai-jobs.net