s
April 18, 2024, 8:50 p.m. |

Simon Willison's Weblog simonwillison.net

Andrej Karpathy's Llama 3 review


The most interesting coverage I've seen so far of Meta's Llama 3 models (8b and 70b so far, 400b promised later).


Andrej notes that Llama 3 trained on 15 trillion tokens - up from 2 trillion for Llama 2 - and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.


The tokenizer has also changed - they now use 128,000 tokens, up from 32,000. This …

70b ai andrej karpathy andrejkarpathy coverage generativeai llama llama 2 llama 3 llms meta notes review tokens

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York