Dec. 3, 2023, 5 p.m. | Yannic Kilcher

Yannic Kilcher www.youtube.com

#chatgpt #privacy #promptengineering

Researchers were able to get giant amounts of training data out of ChatGPT by simply asking it to repeat a word many times over, which causes the model to diverge and start spitting out memorized text.
Why does this happen? And how much of their training data do such models really memorize verbatim?

OUTLINE:
0:00 - Intro
8:05 - Extractable vs Discoverable Memorization
14:00 - Models leak more data than previously thought
20:25 - Some data is …

chatgpt data explained extraction language language models paper privacy production promptengineering researchers scalable text training training data word

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US