[D] Resources for deepening knowledge of Transformers | allainews.com

April 25, 2023, 1:52 p.m. | /u/LightGreenSquash

Machine Learning www.reddit.com

I think I understand the basics of how transformers work, i.e. positional encodings, the idea of attention and "differentiable dictionary indexing", how they process sequences when compared to RNNs, the stack of self-attention and cross-attention layers, etc. I've also read the original paper.

I'm wondering if anyone has a good list of papers and resources that build up on this to **improved architectures** **and/or** intuitions as to **why** they work. Two parallels in CNNs, in each of those directions respectively, …

architectures attention basics building cnns dictionary etc good indexing knowledge list machinelearning paper process resnet resources self-attention stack think transformers work

More from www.reddit.com / Machine Learning

[Discussion] Are there specific technical/scientific breakthroughs that have allowed the significant jump in maximum context … 2 hours ago | www.reddit.com

claude context gpt gpt-4 +14

[D] How to evaluate RAG - both retrieval and generation, when all I have is … 4 hours ago | www.reddit.com

data documents embedding embedding models +7

[D] Has anyone tried distilling large language models the old way? 8 hours ago | www.reddit.com

distillation however language language model +9

[D] Llama-3 (7B and 70B) on a medical domain benchmark 14 hours ago | www.reddit.com

70b ai community benchmark community +10

[D] Data Scientist: job preparation guide 2024 14 hours ago | www.reddit.com

data data scientist genai guide +7

[D] ICML Meta Reviews 15 hours ago | www.reddit.com

machinelearning

[R] Show Your Work with Confidence: Confidence Bands for Tuning Curves 16 hours ago | www.reddit.com

abstract accounting function hyperparameter +11

[R] InternVL v1.5 open sourced, ranking first in OpenCompass multi-modal benchmark 16 hours ago | www.reddit.com

benchmark cvpr demo download +7

[N] Meta releases Llama 3 16 hours ago | www.reddit.com

machinelearning

Data Engineer

@ Bosch Group | San Luis Potosí, Mexico

View on ai-jobs.net

DATA Engineer (H/F)

@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)

View on ai-jobs.net

Advisor, Data engineering

@ Desjardins | 1, Complexe Desjardins, Montréal

View on ai-jobs.net

Data Engineer Intern

@ Getinge | Wayne, NJ, US

View on ai-jobs.net

Software Engineer III- Java / Python / Pyspark / ETL

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

View on ai-jobs.net

Lead Data Engineer (Azure/AWS)

@ Telstra | Telstra ICC Bengaluru

View on ai-jobs.net