April 6, 2024, 8:54 p.m. | /u/Crazy_Suspect_9512

Machine Learning www.reddit.com

From the google paper on griffin and hawk it sounds like they did not use associative scan like mamba. Instead they just ran the recurrent computation by brute force with TPU. So it’s natural to wonder why they didn’t compare the results against SSM with nonlinear activation layer from t-1 to t along the temporal axis(basically an RNN)? It should be more powerful than linear only temporal transitions right?

computation google griffin layer machinelearning mamba natural paper ran results ssm temporal tpu

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain