Feb. 1, 2024, 5:17 a.m. | /u/Dry_Cheesecake_8311

Machine Learning www.reddit.com

\*scalable
The author of Mamba claim ' Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size '.

How about some model like Mamba-13B (just an assumption) vs Mixtral 8x7B with large pre-training data? Has anyone experimented with this?

13b author claim data machinelearning mamba mixtral mixtral 8x7b pre-training scalable training training data transformer transformers

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US