[N] MIT-IBM Watson AI Lab releases MoLM suite with three small sparse MoE models, the largest of which (8B params with 700M experts) performs on par with Pythia 2.8B while its throughput is comparable to Pythia 1.4B | allainews.com

Sept. 14, 2023, 3:07 p.m. | /u/ain92ru

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2306.04640](https://arxiv.org/abs/2306.04640)

GitHub: [https://github.com/ibm/moduleformer](https://github.com/ibm/moduleformer) (under Apache 2.0)

Twitter thread: [https://twitter.com/Yikang\_Shen/status/1702041129267388678](https://twitter.com/Yikang_Shen/status/1702041129267388678)

Abstract:

>Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models. ModuleFormer is based on the Sparse Mixture of Experts (SMoE). Unlike the previous SMoE-based …

abstract architecture beyond data deploy efficiency flexibility knowledge language language models large language large language models llms machinelearning network network architecture neural network paper pre-training training training data

More from www.reddit.com / Machine Learning

[Research] xLSTM: Extended Long Short-Term Memory 10 hours ago | www.reddit.com

abstract contributed deep learning error +16

Non Technical ML Podcasts? [D] 17 hours ago | www.reddit.com

challenge context current data +16

[D] PEFT techniques actually used in the industry 20 hours ago | www.reddit.com

industry machinelearning normally peft +2

[D] Can anyone with the expertise speak to the overlap, or not, between Nvidia's hardware … 22 hours ago | www.reddit.com

apple chips expertise hardware +4

[P] Skyrim - Open-source model zoo for Large Weather Models 23 hours ago | www.reddit.com

ai models building capabilities fine-tuning +7

[P] Identify toxic underwater air bubbles lurking in the substrate with aquatic ultrasonic scans via … 1 day, 1 hour ago | www.reddit.com

arduino classification color identify +11

[P] YARI - Yet Another RAG Implementation. Hybrid context retrieval 1 day, 2 hours ago | www.reddit.com

api context cosine embedding +14

[D] limiting LLM output to certain words 1 day, 3 hours ago | www.reddit.com

apple class classification engineer +10

[D] Recognizing uncommon terms with whisper 1 day, 6 hours ago | www.reddit.com

audio file french hello +9

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net