May 27, 2024, 4:42 a.m. | Duke Nguyen, Aditya Joshi, Flora Salim

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.15310v1 Announce Type: new
Abstract: Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods use a subset of combinations of component functions and weight matrices within the random features paradigm. We identify the need for a systematic comparison of different combinations of weight matrix and component functions for attention learning in Transformer. In this work, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in linearized attention of the …

arxiv cs.lg feature framework random transformer type

Senior Data Engineer

@ Displate | Warsaw

Lead Python Developer - Generative AI

@ S&P Global | US - TX - VIRTUAL

Analytics Engineer - Design Experience

@ Canva | Sydney, Australia

Data Architect

@ Unisys | Bengaluru - RGA Tech Park

Data Architect

@ HP | PSR01 - Bengaluru, Pritech Park- SEZ (PSR01)

Streetlight Analyst

@ DTE Energy | Belleville, MI, US