June 21, 2024, 4:47 a.m. | Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.14528v1 Announce Type: new
Abstract: Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the …

abstract alternative arxiv capabilities challenge complexity computational cs.ai cs.lg explore input mamba paper performance potential processing resources transformer transformers type while

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA