s
April 17, 2023, 5:13 p.m. |

Simon Willison's Weblog simonwillison.net

RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens


With the amount of projects that have used LLaMA as a foundation model since its release two months ago - despite its non-commercial license - it's clear that there is a strong desire for a fully openly licensed alternative.

RedPajama is a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute aiming to build exactly …

ai collaboration dataset foundation foundation model generativeai homebrewllms llama llms opensource project projects redpajama release research stanford tokens training

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US