[D] Why transformers are not trained layer-wise? | allainews.com

April 25, 2024, 2:16 p.m. | /u/kiockete

Machine Learning www.reddit.com

It seems to me that thanks to the residual path the gradient that flows to each layer is the same regardless of the transformer layer/block. Example:

ProjectionAndCost(X + L1(X) + L2(X + L1(X)) + L3(X + L1(X) + L2(X + L1(X))) ...)

Since the input to ProjectionAndCost is just sum of outputs from all layers and initial embeddings then the gradient that comes to the layer L1 is the same as the gradient that comes to L2 or L3.

So …

block example gradient layer machinelearning path residual sum transformer transformers wise

More from www.reddit.com / Machine Learning

[D] ECCV-2024 reviews are out 12 hours ago | www.reddit.com

eccv machinelearning reviews

[D] ICLR Outstanding Paper Awards. Congratulations! 14 hours ago | www.reddit.com

abstract feature identify images +12

[D] Where does the term "feature" come from? 16 hours ago | www.reddit.com

call engineering feature features +8

[D] Any encoder only model having bigger max token than 512 (BERT, Roberta, etc)? 22 hours ago | www.reddit.com

advance bert bigger class +8

[R] AlphaMath Almost Zero: process Supervision without process 22 hours ago | www.reddit.com

abstract code errors however +15

[D] ECCV 2024 Review Discussion 23 hours ago | www.reddit.com

center conferences eccv machinelearning +5

[D] Is it a good idea for a 3rd year PhD student to start a … 1 day, 1 hour ago | www.reddit.com

academic extra good hearing +7

[D] Use VQ-VAEs for SSL? 1 day, 2 hours ago | www.reddit.com

create diffusion diffusion models embedding +10

[D] Matrix Profile vs. Deep Learning for Multivariate Time Series 1 day, 4 hours ago | www.reddit.com

context curiosity data deep learning +16

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net