March 14, 2024, 6:23 p.m. | /u/Sinestro101

Machine Learning www.reddit.com

I’m trying to gain a deeper understanding of the concept of memory in RNNs (and it’s variants) and in Transformers.

Aside the architectural differences between the plain RNN, GRU and LSTM, memory is basically the input sequence being processed through some mathematical function and served in a sequential manner as input to the next time step (along input Xt) sort of as a prior representation of the data.

From this technical perspective memory seems constrained to the length of the …

concept differences function gru inside lstm machinelearning memory next rnn through transformers understanding variants

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Applied Scientist

@ Microsoft | Redmond, Washington, United States

Data Analyst / Action Officer

@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States