Feb. 27, 2024, 5:44 a.m. | Huy Nguyen, Pedram Akbarian, Fanqi Yan, Nhat Ho

cs.LG updates on arXiv.org arxiv.org

arXiv:2309.13850v2 Announce Type: replace-cross
Abstract: Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive deep-learning architectures without increasing the computational cost. Despite its popularity in real-world applications, the theoretical understanding of that gating function has remained an open problem. The main challenge comes from the structure of the top-K sparse softmax gating function, which partitions the input space into multiple regions with distinct behaviors. By focusing on a Gaussian mixture of experts, we establish …

abstract applications architectures arxiv challenge computational cost cs.lg experts function massive mixture of experts perspective scaling scaling up softmax statistical stat.ml type understanding world

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States