all AI news
Cross-token Modeling with Conditional Computation. (arXiv:2109.02008v3 [cs.LG] UPDATED)
Jan. 17, 2022, 2:10 a.m. | Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You
cs.LG updates on arXiv.org arxiv.org
Mixture-of-Experts (MoE), a conditional computation architecture, achieved
promising performance by scaling local module (i.e. feed-forward network) of
transformer. However, scaling the cross-token module (i.e. self-attention) is
challenging due to the unstable training. This work proposes Sparse-MLP, an
all-MLP model which applies sparsely-activated MLPs to cross-token modeling.
Specifically, in each Sparse block of our all-MLP model, we apply two stages of
MoE layers: one with MLP experts mixing information within channels along image
patch dimension, the other with MLP experts mixing …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Cleared Senior Software Engineer, Computer Vision, Federal
@ CCRi | Chantilly, Virginia, United States
Data Analyst - B2C
@ DAZN | Hyderabad, India
Product Marketing Manager - AI Chatbot
@ SendBird | San Mateo, California, United States
Alternance Alternant Ingénieur Développement logiciel temps réel embarqué / computer vision (F/H)
@ Alstom | Villeurbanne, FR
AOT Data Analyst II - Highway Project Delivery
@ State of Vermont | Barre, VT, US