all AI news
[P] Don't have enough GPU to train Mixtral? Why not try LLaMA-MoE~
Dec. 25, 2023, 11:36 a.m. | /u/Spico197
Machine Learning www.reddit.com
1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
2. Continually pre-train the initialized MoE model with an optimized data sampling weights from Sheared LLaMA and filtered datasets from SlimPajama.
If you don't have plenty of computing resources to train Mixtral, you may want to try LLaMA-MoE for downstream researches.
Check …
build expert experts gate gpu layer llama machinelearning mixtral moe series train
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US