Feb. 19, 2024, 5:47 a.m. | Yi Lu, Xin Zhou, Wei He, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.10685v1 Announce Type: new
Abstract: Large language models (LLMs) have achieved impressive performance in numerous domains but often struggle to process lengthy inputs effectively and efficiently due to limited length generalization and attention's quadratic computational demands. Many sought to mitigate this by restricting the attention window within the pre-trained length. However, these methods introduce new issues such as ignoring the middle context and requiring additional training. To address these problems, we propose LongHeads, a training-free framework that enhances LLM's long …

abstract arxiv attention computational context cs.ai cs.cl domains head inputs language language models large language large language models llms multi-head multi-head attention performance process processor struggle type

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A

GN SONG MT Market Research Data Analyst 09

@ Accenture | Bengaluru, BDC7A