Web: https://www.reddit.com/r/deeplearning/comments/scj82w/question_regarding_the_transformerarchitecture/

Jan. 25, 2022, 5:39 p.m. | /u/CleverProgrammer12

Deep Learning reddit.com

I am trying to implement transformers in pytorch from scratch. If we feed into the decoder block what the transformer had previously generated. In my understanding the output of the decoder block should be of dimension(acc to the tutorial referenced below)

(batch_size, Ty, trg_vocab_size) 

The Ty is the len of inp to the decoder. Do we avg it? bc we want it to only generate one word at a time, right? Why is the output of the decoder(transformer block) dependent …

architecture deeplearning transformer

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY

Data Analyst

@ Colorado Springs Police Department | Colorado Springs, CO

Predictive Ecology Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX