all AI news
On the Representation Collapse of Sparse Mixture of Experts. (arXiv:2204.09179v2 [cs.CL] UPDATED)
May 23, 2022, 1:12 a.m. | Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Furu Wei
cs.CL updates on arXiv.org arxiv.org
Sparse mixture of experts provides larger model capacity while requiring a
constant computational overhead. It employs the routing mechanism to distribute
input tokens to the best-matched experts according to their hidden
representations. However, learning such a routing mechanism encourages token
clustering around expert centroids, implying a trend toward representation
collapse. In this work, we propose to estimate the routing scores between
tokens and experts on a low-dimensional hypersphere. We conduct extensive
experiments on cross-lingual language model pre-training and fine-tuning on …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote