April 9, 2024, 4:43 a.m. | Zeyuan Allen-Zhu, Yuanzhi Li

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.05405v1 Announce Type: cross
Abstract: Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model's capability via loss or benchmarks, we estimate the number of knowledge bits a model stores. We focus on factual knowledge represented as tuples, such as (USA, capital, Washington D.C.) from a Wikipedia page. Through multiple controlled datasets, we establish that language models can and only can store 2 bits of knowledge per parameter, even …

abstract arxiv benchmarks capabilities capability capacity cs.ai cs.cl cs.lg focus knowledge language language models laws loss part physics prior relationship scaling stores studies tuples type via

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York