April 22, 2024, 10:08 a.m. | /u/AIRI_Institute

Machine Learning www.reddit.com

The researchers segmented the sequence and added special memory tokens to the input: memory states from the output of the previous segment became inputs for the next one. Thus, a whole transformer acts as a recurrent cell, and memory serves as the recurrent state of the network. This approach was called Recurrent Memory Transformer (RMT).

The authors augmented small transformer models like BERT and GPT-2 with this memory and tested them on various question-answering tasks where facts needed for answering …

context inputs machinelearning memory networks neural networks next researchers segment state tokens transformer

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120