March 28, 2024, 4:48 a.m. | Zhichao Xu

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.18276v1 Announce Type: cross
Abstract: Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, …

abstract applied machine learning architecture arxiv attention benchmarking communities complexity computer computer vision core cs.cl cs.ir document inference information language language processing machine machine learning mamba multiple natural natural language natural language processing nlp performance processing ranking retrieval success training transformer transformer architecture transformers type vision

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Director, Clinical Data Science

@ Aura | Remote USA

Research Scientist, AI (PhD)

@ Meta | Menlo Park, CA | New York City