all AI news
Scalable Extraction of Training Data from (Production) Language Models (Paper Explained)
Dec. 3, 2023, 5 p.m. | Yannic Kilcher
Yannic Kilcher www.youtube.com
Researchers were able to get giant amounts of training data out of ChatGPT by simply asking it to repeat a word many times over, which causes the model to diverge and start spitting out memorized text.
Why does this happen? And how much of their training data do such models really memorize verbatim?
OUTLINE:
0:00 - Intro
8:05 - Extractable vs Discoverable Memorization
14:00 - Models leak more data than previously thought
20:25 - Some data is …
chatgpt data explained extraction language language models paper privacy production promptengineering researchers scalable text training training data word
More from www.youtube.com / Yannic Kilcher
[ML News] Chips, Robots, and Models
2 weeks, 5 days ago |
www.youtube.com
[ML News] Llama 3 changes the game
3 weeks, 5 days ago |
www.youtube.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US