Feb. 23, 2024, 5:43 a.m. | Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.03233v2 Announce Type: replace
Abstract: In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of compute. This leads to the notion of a `compute-optimal' model, i.e. a model that …

abstract art arxiv computational compute cs.cv cs.lg data deep learning investing large models laws leads paradigm performance resources scaling simple state training type vast

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York