Linearizing Large Language Models | allainews.com

May 13, 2024, 4:46 a.m. | Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar

cs.CL updates on arXiv.org arxiv.org

arXiv:2405.06640v1 Announce Type: new
Abstract: Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by proposing novel time-mixing and gating architectures, but pre-training large language models requires significant data and compute investments. Thus, the search for subquadratic …

arxiv cs.cl language language models large language large language models type

More from arxiv.org / cs.CL updates on arXiv.org

ChatDev: Communicative Agents for Software Development 23 hours ago | arxiv.org

agents arxiv chatdev communicative agents +8

Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions 23 hours ago | arxiv.org

abstract arxiv challenges cs.ai +18

JumpCoder: Go Beyond Autoregressive Coder via Online Modification 23 hours ago | arxiv.org

arxiv autoregressive beyond coder +6

Building Efficient and Effective OpenQA Systems for Low-Resource Languages 23 hours ago | arxiv.org

arxiv building cs.cl languages +4

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning 23 hours ago | arxiv.org

abstract arxiv capabilities code +18

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models 23 hours ago | arxiv.org

abstract art arxiv cs.ai +27

Uncertainty Estimation on Sequential Labeling via Uncertainty Transmission 23 hours ago | arxiv.org

arxiv cs.cl labeling replace +3

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models 23 hours ago | arxiv.org

arxiv benchmark constraints cs.cl +7

PartialFormer: Modeling Part Instead of Whole for Machine Translation 23 hours ago | arxiv.org

arxiv cs.ai cs.cl machine +6

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Director, Global Success Business Intelligence

@ Salesforce | Texas - Austin

View on ai-jobs.net

Deep Learning Compiler Engineer - MLIR

@ NVIDIA | US, CA, Santa Clara

View on ai-jobs.net

Commerce Data Engineer (Remote)

@ CrowdStrike | USA TX Remote

View on ai-jobs.net