all AI news
Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)
Sept. 13, 2023, 12:08 a.m. | Yannic Kilcher
Yannic Kilcher www.youtube.com
Retention is an alternative to Attention in Transformers that can both be written in a parallel and in a recurrent fashion. This means the architecture achieves training parallelism while maintaining low-cost inference. Experiments in the paper look very promising.
OUTLINE:
0:00 - Intro
2:40 - The impossible triangle
6:55 - Parallel vs sequential
15:35 - Retention mechanism
21:00 - Chunkwise and multi-scale retention
24:10 - Comparison to other architectures
26:30 - Experimental evaluation
Paper: https://arxiv.org/abs/2307.08621
Abstract:
In …
architecture attention cost explained fashion inference intro language language models large language large language models look low network paper retention training transformer transformers
More from www.youtube.com / Yannic Kilcher
[ML News] Chips, Robots, and Models
2 weeks, 5 days ago |
www.youtube.com
[ML News] Llama 3 changes the game
3 weeks, 5 days ago |
www.youtube.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US