[Project] Sparse Mixture of Experts Language Model from Scratch in less than 300 lines of Python + pytorch | allainews.com

March 20, 2024, 3:20 p.m. | /u/avi1x

Machine Learning www.reddit.com

Hi all,I implemented a sparse mixture of experts language model (basically a tiny version of what's used in Mixtral, Grok-1 and supposedly GPT-4) from scratch in pure pytorch and trained it on tiny Shakespeare. This is based largely on makemore from Andrej Karpathy (an autoregressive character-level decoder only transformer model). My goal is for this to be a hackable implementation that people use to understand how this really works and improve upon. I foresee more and more of these models …

experts gpt gpt-4 grok language language model machinelearning mixtral mixture of experts project python pytorch scratch

More from www.reddit.com / Machine Learning

[D] Real talk about RAG 5 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 10 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 11 hours ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 11 hours ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 11 hours ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 13 hours ago | www.reddit.com

compression educational encoding entropy +7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 15 hours ago | www.reddit.com

abstract advancement application challenges +15

[D] Does it make sense to talk about the probabilities of models? 21 hours ago | www.reddit.com

compute data likelihood machinelearning +4

Open-Sourced: Automated Data Sorting Tools [P] 1 day, 5 hours ago | www.reddit.com

application automated building community +11

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Director, Clinical Data Science

@ Aura | Remote USA

View on ai-jobs.net

Research Scientist, AI (PhD)

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net