all AI news
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Feb. 27, 2024, 5:44 a.m. | Maciej Pi\'oro, Kamil Ciebiera, Krystian Kr\'ol, Jan Ludziejewski, Micha{\l} Krutul, Jakub Krajewski, Szymon Antoniak, Piotr Mi{\l}o\'s, Marek Cygan,
cs.LG updates on arXiv.org arxiv.org
Abstract: State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, including recent state-of-the-art open models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable performance. Our model, MoE-Mamba, outperforms both Mamba …
abstract art arxiv become cs.ai cs.cl cs.lg experts language language models large language large language models mamba mixture of experts modeling moe open models space state transformer transformers type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South