Aug. 23, 2023, 1:01 p.m. | Jonathan Kemper

THE DECODER the-decoder.com


The Allen Institute for AI (AI2) has unveiled Dolma, an open-source dataset of three trillion tokens from a diverse collection of web content, scientific publications, code, and books.


The article AI2 releases Dolma, the largest open-source dataset for LLMs appeared first on THE DECODER.

ai2 ai research ai training allen allen institute allen institute for ai article artificial intelligence books code collection dataset decoder diverse dolma institute llms open source publications releases tokens web

More from the-decoder.com / THE DECODER

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Machine Learning Engineer (AI, NLP, LLM, Generative AI)

@ Palo Alto Networks | Santa Clara, CA, United States

Consultant Senior Data Engineer F/H

@ Devoteam | Nantes, France