all AI news
Llama 3 from Scratch?? 15T Tokens Data for you!!!
April 21, 2024, 10:24 p.m. | 1littlecoder
1littlecoder www.youtube.com
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 15T tokens of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library.
🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release of …
data data processing dataset english llama llama 3 llm llm performance performance pipeline processing ran scratch tokens web
More from www.youtube.com / 1littlecoder
This Freaky AI Turns Your Thoughts Into Words
1 day, 23 hours ago |
www.youtube.com
I Let My AGENT Loose (AI Town World Editor)
2 days, 3 hours ago |
www.youtube.com
ALMOST a step closer to HER!! (ChatGPT Memory Tutorial)
3 days, 3 hours ago |
www.youtube.com
Is it a NEW OpenAI MODEL? (Testing gpt2-chatbot)
3 days, 22 hours ago |
www.youtube.com
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120