s
April 18, 2024, 8:50 p.m. |

Simon Willison's Weblog simonwillison.net

Andrej Karpathy's Llama 3 review


The most interesting coverage I've seen so far of Meta's Llama 3 models (8b and 70b so far, 400b promised later).


Andrej notes that Llama 3 trained on 15 trillion tokens - up from 2 trillion for Llama 2 - and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.


The tokenizer has also changed - they now use 128,000 tokens, up from 32,000. This …

70b ai andrej karpathy andrejkarpathy coverage generativeai llama llama 2 llama 3 llms meta notes review tokens

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India