May 5, 2023, 4:38 a.m. | /u/RYSKZ

Machine Learning www.reddit.com

**Abstract**:

>Transformer-based models typically have a predefined bound to their input length, because of their need to potentially attend to every token in the input. In this work, we propose Unlimiformer: a general approach that can wrap any existing pretrained encoder-decoder transformer, and offload the attention computation across all layers to a single k-nearestneighbor index; this index can be kept on either the GPU or CPU memory and queried in sub-linear time. This way, we can index extremely long input …

abstract attention computation decoder encoder encoder-decoder general machinelearning transformer transformers work

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Associate (Data Science/Information Engineering/Applied Mathematics/Information Technology)

@ Nanyang Technological University | NTU Main Campus, Singapore

Associate Director of Data Science and Analytics

@ Penn State University | Penn State University Park

Student Worker- Data Scientist

@ TransUnion | Israel - Tel Aviv

Vice President - Customer Segment Analytics Data Science Lead

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India

Middle/Senior Data Engineer

@ Devexperts | Sofia, Bulgaria