all AI news
[N] MIT-IBM Watson AI Lab releases MoLM suite with three small sparse MoE models, the largest of which (8B params with 700M experts) performs on par with Pythia 2.8B while its throughput is comparable to Pythia 1.4B
Sept. 14, 2023, 3:07 p.m. | /u/ain92ru
Machine Learning www.reddit.com
GitHub: [https://github.com/ibm/moduleformer](https://github.com/ibm/moduleformer) (under Apache 2.0)
Twitter thread: [https://twitter.com/Yikang\_Shen/status/1702041129267388678](https://twitter.com/Yikang_Shen/status/1702041129267388678)
Abstract:
>Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models. ModuleFormer is based on the Sparse Mixture of Experts (SMoE). Unlike the previous SMoE-based …
abstract architecture beyond data deploy efficiency flexibility knowledge language language models large language large language models llms machinelearning network network architecture neural network paper pre-training training training data
More from www.reddit.com / Machine Learning
[D] limiting LLM output to certain words
1 day, 3 hours ago |
www.reddit.com
[D] Recognizing uncommon terms with whisper
1 day, 6 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote