all AI news
Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)
Sept. 13, 2023, 12:08 a.m. | Yannic Kilcher
Yannic Kilcher www.youtube.com
Retention is an alternative to Attention in Transformers that can both be written in a parallel and in a recurrent fashion. This means the architecture achieves training parallelism while maintaining low-cost inference. Experiments in the paper look very promising.
OUTLINE:
0:00 - Intro
2:40 - The impossible triangle
6:55 - Parallel vs sequential
15:35 - Retention mechanism
21:00 - Chunkwise and multi-scale retention
24:10 - Comparison to other architectures
26:30 - Experimental evaluation
Paper: https://arxiv.org/abs/2307.08621
Abstract:
In …
architecture attention cost explained fashion inference intro language language models large language large language models look low network paper retention training transformer transformers
More from www.youtube.com / Yannic Kilcher
[ML News] Chips, Robots, and Models
2 days, 17 hours ago |
www.youtube.com
TransformerFAM: Feedback attention is working memory
4 days, 15 hours ago |
www.youtube.com
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Robotics Technician - 3rd Shift
@ GXO Logistics | Perris, CA, US, 92571