March 28, 2024, 4:48 a.m. | Zhichao Xu

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.18276v1 Announce Type: cross
Abstract: Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, …

abstract applied machine learning architecture arxiv attention benchmarking communities complexity computer computer vision core cs.cl cs.ir document inference information language language processing machine machine learning mamba multiple natural natural language natural language processing nlp performance processing ranking retrieval success training transformer transformer architecture transformers type vision

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York