April 1, 2024, 4:42 a.m. | Hanting Chen, Zhicheng Liu, Xutao Wang, Yuchuan Tian, Yunhe Wang

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.19928v1 Announce Type: cross
Abstract: In an effort to reduce the computational load of Transformers, research on linear attention has gained significant momentum. However, the improvement strategies for attention mechanisms typically necessitate extensive retraining, which is impractical for large language models with a vast array of parameters. In this paper, we present DiJiang, a novel Frequency Domain Kernelization approach that enables the transformation of a pre-trained vanilla Transformer into a linear complexity model with little training costs. By employing a …

arxiv compact cs.cl cs.lg language language models large language large language models through type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

NUSolve Innovation Assistant/Associate in Data Science'

@ Newcastle University | Newcastle, GB

Data Engineer (Snowflake)

@ Unit4 | Lisbon, Portugal

Lead Data Engineer

@ Provident Bank | Woodbridge, NJ, US

Specialist Solutions Engineer (Data Science/Machine Learning)

@ Databricks | London, United Kingdom

Staff Software Engineer, Data Mirgrations

@ Okta | Canada