March 16, 2024, 5:30 p.m. | /u/Successful-Western27

machinelearningnews www.reddit.com

A new [paper](https://arxiv.org/pdf/2403.09611.pdf) from Apple presents MM1, a family of multimodal AI models that combine vision and language understanding. The researchers conducted extensive experiments to identify the key factors driving performance in these models, testing different architectural choices and pre-training data mixtures.

Here are my highlights from the paper:

Big one of course: The largest MM1 model (30B dense) achieves state-of-the-art few-shot learning on multimodal benchmarks

Key points:

* MM1 includes both dense models up to 30B parameters and mixture-of-experts …

art benchmarks big course design experts few-shot few-shot learning highlights image impact key language machinelearningnews moe multimodal of course paper parameters performance state variants vision

More from www.reddit.com / machinelearningnews

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineering Manager, Generative AI - Characters

@ Meta | Bellevue, WA | Menlo Park, CA | Seattle, WA | New York City | San Francisco, CA

Senior Operations Research Analyst / Predictive Modeler

@ LinQuest | Colorado Springs, Colorado, United States