all AI news
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
June 21, 2024, 4:47 a.m. | Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes
cs.LG updates on arXiv.org arxiv.org
Abstract: Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the …
abstract alternative arxiv capabilities challenge complexity computational cs.ai cs.lg explore input mamba paper performance potential processing resources transformer transformers type while
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Data Engineer
@ Displate | Warsaw
PhD Student AI simulation electric drive (f/m/d)
@ Volkswagen Group | Kassel, DE, 34123
AI Privacy Research Lead
@ Leidos | 6314 Remote/Teleworker US
Senior Platform System Architect, Silicon
@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan
Fabrication Hardware Litho Engineer, Quantum AI
@ Google | Goleta, CA, USA