s
Feb. 13, 2024, 5:14 p.m. |

Simon Willison's Weblog simonwillison.net

Aya


"A global initiative led by Cohere For AI involving over 3,000 independent researchers across 119 countries. Aya is a state-of-art model and dataset, pushing the boundaries of multilingual AI for 101 languages through open science."


Both the model and the training data are released under Apache 2. The training data looks particularly interesting: "513 million instances through templating and translating existing datasets across 114 languages" - suggesting the data is mostly automatically generated.


Via Hacker News

ai apache art cohere data dataset generativeai global independent instances languages llms multilingual multilingual ai open science opensource researchers science state through training training data

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US