TransformerFAM: Feedback attention is working memory | allainews.com

April 28, 2024, 9:19 p.m. | Yannic Kilcher

Yannic Kilcher www.youtube.com

Paper: https://arxiv.org/abs/2404.09173

Abstract:
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that …

abstract architecture attention complexity deep learning design emergence feedback inputs loop memory network novel process transformer transformer architecture transformers while

More from www.youtube.com / Yannic Kilcher

ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained) 2 weeks, 1 day ago | www.youtube.com

abstract algorithms alignment building +14

[ML News] Chips, Robots, and Models 2 weeks, 2 days ago | www.youtube.com

accelerator adobe ai training ai training data +22

TransformerFAM: Feedback attention is working memory 2 weeks, 4 days ago | www.youtube.com

abstract architecture attention complexity +14

[ML News] Devin exposed | NeurIPS track for high school students 2 weeks, 5 days ago | www.youtube.com

ai-powered ai software ai software engineer devin +15

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 3 weeks, 1 day ago | www.youtube.com

abstract attention computation context +15

[ML News] Llama 3 changes the game 3 weeks, 2 days ago | www.youtube.com

bitcoin btc game license +7

Hugging Face got hacked 4 weeks, 1 day ago | www.youtube.com

bitcoin btc eth ethereum +5

[ML News] Microsoft to spend 100 BILLION DOLLARS on supercomputer (& more industry news) 1 month ago | www.youtube.com

billion industry machine machine learning +7

[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a … 1 month ago | www.youtube.com

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net