all AI news
AI2 releases Dolma, the largest open-source dataset for LLMs
Aug. 23, 2023, 1:01 p.m. | Jonathan Kemper
THE DECODER the-decoder.com
The Allen Institute for AI (AI2) has unveiled Dolma, an open-source dataset of three trillion tokens from a diverse collection of web content, scientific publications, code, and books.
The article AI2 releases Dolma, the largest open-source dataset for LLMs appeared first on THE DECODER.
ai2 ai research ai training allen allen institute allen institute for ai article artificial intelligence books code collection dataset decoder diverse dolma institute llms open source publications releases tokens web
More from the-decoder.com / THE DECODER
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Machine Learning Engineer (AI, NLP, LLM, Generative AI)
@ Palo Alto Networks | Santa Clara, CA, United States
Consultant Senior Data Engineer F/H
@ Devoteam | Nantes, France