April 27, 2023, 9:37 p.m. | Yannic Kilcher

Yannic Kilcher www.youtube.com

#ai #transformer #gpt4

This paper promises to scale transformers to 1 million tokens and beyond. We take a look at the technique behind it: The Recurrent Memory Transformer, and what its strenghts and weaknesses are.

OUTLINE:
0:00 - Intro
2:15 - Transformers on long sequences
4:30 - Tasks considered
8:00 - Recurrent Memory Transformer
19:40 - Experiments on scaling and attention maps
24:00 - Conclusion

Paper: https://arxiv.org/abs/2304.11062

Abstract:
This technical report presents the application of a recurrent memory to extend …

application attention bert beyond context explained gpt4 look maps memory paper recurrent memory transformer report scale scaling technical tokens transformer transformers

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant Senior Power BI & Azure - CDI - H/F

@ Talan | Lyon, France