[R] Uncovering mesa-optimization algorithms in Transformers (from Google Research, ETH Zürich, and Google DeepMind) | allainews.com

Sept. 15, 2023, 12:31 p.m. | /u/Wiskkey

Machine Learning www.reddit.com

[Paper](https://arxiv.org/abs/2309.05858). I am not affiliated with this work or its authors.

Abstract:

>Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. …

abstract become bias construction deep learning machinelearning mesa optimization performance process reason running transformers

More from www.reddit.com / Machine Learning

[P] Open source library to scrape PDFs, YouTube, URLs, Presentations, etc for API-hosted vision-language models 11 hours ago | www.reddit.com

fun machinelearning

[P] LoRA from scratch implementation for LLM classifier training 15 hours ago | www.reddit.com

classifier implementation llm lora +3

[D] Dealing with conflicting training configurations in reference works. 16 hours ago | www.reddit.com

active learning compute detection machinelearning +7

[R] Marcus Hutter's work on Universal Artificial Intelligence 21 hours ago | www.reddit.com

artificial artificial intelligence bayesian biography +11

[P] LLMinator: A Llama.cpp + Gradio based opensource Chatbot to run llms locally(cpu/cuda) directly from … 23 hours ago | www.reddit.com

chatbot community context cpp +13

[D] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Edition 1 day ago | www.reddit.com

book keras learn machine +7

[D] How to train very shallow (dot product) networks with huge embeddings on a GPU … 1 day ago | www.reddit.com

cluster compute cpu embedding +11

[P] Google Colab crashes before even training my images dataset. 1 day, 13 hours ago | www.reddit.com

binary class classification colab +16

[D] Is Evaluating LLM Performance on Domain-Specific QA Sufficient for a Top-Tier Conference Submission? 1 day, 14 hours ago | www.reddit.com

conference domain five hello +9

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net