all AI news
[R] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
May 31, 2022, 7:19 p.m. | /u/Singularian2501
Machine Learning www.reddit.com
Twitter: [https://twitter.com/tri\_dao/status/1531437619791290369?t=UXOZXyk1p9CCrMJLlkDcDg&s=19](https://twitter.com/tri_dao/status/1531437619791290369?t=UXOZXyk1p9CCrMJLlkDcDg&s=19)
Abstract:
" Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware …
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Software Engineering Manager, Generative AI - Characters
@ Meta | Bellevue, WA | Menlo Park, CA | Seattle, WA | New York City | San Francisco, CA
Senior Operations Research Analyst / Predictive Modeler
@ LinQuest | Colorado Springs, Colorado, United States