June 2, 2023, 1:01 a.m. | /u/IxinDow

Machine Learning www.reddit.com

[https://arxiv.org/pdf/2305.19370.pdf](https://arxiv.org/pdf/2305.19370.pdf)

It's honest Transformer and honest attention. No cheating.

>**We use the same model architecture as the original Transformer, but with a different way of organizing the compute.**

From conclusion:

>Our approach enables processing longer input sequences while maintaining or improving performance. Through extensive experiments, we demonstrate its effectiveness, achieving **up to 4x memory reduction than memory-efficient Transformers**. Our contributions include a practical method for long context lengths in large Transformer models.

Abstract:

>Transformers have emerged as the cornerstone of …

architecture attention cheating compute machinelearning memory performance processing through transformer transformers

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US