[R] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | allainews.com

May 31, 2022, 7:19 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2205.14135](https://arxiv.org/abs/2205.14135)

Twitter: [https://twitter.com/tri\_dao/status/1531437619791290369?t=UXOZXyk1p9CCrMJLlkDcDg&s=19](https://twitter.com/tri_dao/status/1531437619791290369?t=UXOZXyk1p9CCrMJLlkDcDg&s=19)

Abstract:

" Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware …

attention io machinelearning memory

More from www.reddit.com / Machine Learning

[D] What are the most common and significant challenges moving your LLM (application/system) to production? 4 hours ago | www.reddit.com

application building challenges companies +10

[P] NLLB-200 Distill 350M for en-ko 10 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 18 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 23 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 1 day ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 1 day ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 1 day ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 1 day, 3 hours ago | www.reddit.com

compression educational encoding entropy +7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 1 day, 4 hours ago | www.reddit.com

abstract advancement application challenges +15

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Software Engineering Manager, Generative AI - Characters

@ Meta | Bellevue, WA | Menlo Park, CA | Seattle, WA | New York City | San Francisco, CA

View on ai-jobs.net

Senior Operations Research Analyst / Predictive Modeler

@ LinQuest | Colorado Springs, Colorado, United States

View on ai-jobs.net