June 5, 2024, 4:43 a.m. | Phoebe Klett, Thomas Ahle

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.02332v1 Announce Type: new
Abstract: Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should be updated …

abstract access arxiv bank become common sense cs.cl cs.lg fix general inference information inputs intelligence language language models memories mind sense show simple transformers type

