May 20, 2024, 4:42 a.m. | Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor R\"uhle, Saravan Rajmohan

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.10480v1 Announce Type: cross
Abstract: Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching billions of parameters. These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators, such as GPUs. Specifically, the time and memory complexity of the attention operation is quadratic in terms of the total context length, i.e., prompt …

abstract architectures art arxiv attention cs.ar cs.lg decode hardware image image generation language language generation language processing lean memory natural natural language natural language generation natural language processing parameters processing scalable state state-of-the-art models transformer transformer-based models transformers type

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Senior Applied Data Scientist

@ dunnhumby | London

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV