all AI news
[R] Uncovering mesa-optimization algorithms in Transformers (from Google Research, ETH Zürich, and Google DeepMind)
Sept. 15, 2023, 12:31 p.m. | /u/Wiskkey
Machine Learning www.reddit.com
Abstract:
>Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. …
abstract become bias construction deep learning machinelearning mesa optimization performance process reason running transformers
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
R_00029290 Lead Data Modeler – Remote
@ University of Texas at Austin | Austin, TX
R_00029290 Lead Data Modeler – Remote
@ University at Buffalo | Austin, TX
Senior AI/ML Developer
@ Lemon.io | Remote
Senior Data Engineer - Enterprise Data
@ Fannie Mae | Reston, VA, United States
Senior Data Scientist, Ecosystems
@ Instacart | United States, Canada - Remote
Power BI / Lead Analyst
@ NECSWS | Bexleyheath, United Kingdom