[R] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | allainews.com

Jan. 10, 2024, 4 a.m. | /u/APaperADay

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2401.04081](https://arxiv.org/abs/2401.04081)

**Code**: [https://github.com/llm-random/llm-random](https://github.com/llm-random/llm-random)

**Abstract**:

>State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based LLMs, including recent state-of-the-art open-source models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable, Transformer-like performance. Our model, **MoE-Mamba**, outperforms both Mamba and …

abstract art become experts llms machinelearning mamba mixture of experts modeling moe open-source models scaling space state transformer transformers

More from www.reddit.com / Machine Learning

[D] Are PyTorch high-level frameworks worth using? an hour ago | www.reddit.com

biases experiment frameworks ignite +10

[R] Energy-based Hopfield Boosting for Out-of-Distribution Detection 9 hours ago | www.reddit.com

advanced boosting data decision +14

[D] LWhy are Linear RNNs so performant (in terms of accuracy, not compute)? Looking for … 10 hours ago | www.reddit.com

accuracy architecture compute linear +5

[D] Unveiling MileBench: Benchmarking MLLMs in Long Contexts! 13 hours ago | www.reddit.com

benchmark benchmarking benchmarks complexity +15

[D] What’s the best cloud compute service for hobby projects? 13 hours ago | www.reddit.com

applications cloud compute computer +17

[P] Needle in a Needlestack (NIAN) 14 hours ago | www.reddit.com

attention become benchmark context +12

[R] Pretraining a byte-level 0.67B transformer on a single A100 15 hours ago | www.reddit.com

a100 bias business encoding +10

[R] Integrating AI into search engines: How Yandex is making more sophisticated use of AI 15 hours ago | www.reddit.com

advertising article business director +12

[D] Apriori Algorithm 17 hours ago | www.reddit.com

algorithm algorithms cases machinelearning +2

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net