March 20, 2024, 3:20 p.m. | /u/avi1x

Machine Learning www.reddit.com

Hi all,I implemented a sparse mixture of experts language model (basically a tiny version of what's used in Mixtral, Grok-1 and supposedly GPT-4) from scratch in pure pytorch and trained it on tiny Shakespeare. This is based largely on makemore from Andrej Karpathy (an autoregressive character-level decoder only transformer model). My goal is for this to be a hackable implementation that people use to understand how this really works and improve upon. I foresee more and more of these models …

experts gpt gpt-4 grok language language model machinelearning mixtral mixture of experts project python pytorch scratch

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Director, Clinical Data Science

@ Aura | Remote USA

Research Scientist, AI (PhD)

@ Meta | Menlo Park, CA | New York City