May 27, 2024, 4:42 a.m. | Duke Nguyen, Aditya Joshi, Flora Salim

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.15310v1 Announce Type: new
Abstract: Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods use a subset of combinations of component functions and weight matrices within the random features paradigm. We identify the need for a systematic comparison of different combinations of weight matrix and component functions for attention learning in Transformer. In this work, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in linearized attention of the …

arxiv cs.lg feature framework random transformer type

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Staff Software Engineer (Data Platform)

@ Phaidra | Remote

Distributed Compute Engineer

@ Magic | San Francisco

Power Platform Developer/Consultant

@ Euromonitor | Bengaluru, Karnataka, India

Finance Project Senior Manager

@ QIMA | London, United Kingdom