April 25, 2023, 1:52 p.m. | /u/LightGreenSquash

Machine Learning www.reddit.com

I think I understand the basics of how transformers work, i.e. positional encodings, the idea of attention and "differentiable dictionary indexing", how they process sequences when compared to RNNs, the stack of self-attention and cross-attention layers, etc. I've also read the original paper.

I'm wondering if anyone has a good list of papers and resources that build up on this to **improved architectures** **and/or** intuitions as to **why** they work. Two parallels in CNNs, in each of those directions respectively, …

architectures attention basics building cnns dictionary etc good indexing knowledge list machinelearning paper process resnet resources self-attention stack think transformers work

Data Engineer

@ Bosch Group | San Luis Potosí, Mexico

DATA Engineer (H/F)

@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)

Advisor, Data engineering

@ Desjardins | 1, Complexe Desjardins, Montréal

Data Engineer Intern

@ Getinge | Wayne, NJ, US

Software Engineer III- Java / Python / Pyspark / ETL

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

Lead Data Engineer (Azure/AWS)

@ Telstra | Telstra ICC Bengaluru