Jan. 25, 2022, 5:39 p.m. | /u/CleverProgrammer12

Deep Learning www.reddit.com

I am trying to implement transformers in pytorch from scratch. If we feed into the decoder block what the transformer had previously generated. In my understanding the output of the decoder block should be of dimension(acc to the tutorial referenced below)

(batch_size, Ty, trg_vocab_size) 

The Ty is the len of inp to the decoder. Do we avg it? bc we want it to only generate one word at a time, right? Why is the output of the decoder(transformer block) dependent …

architecture deeplearning transformer

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

AI Scientist/Engineer

@ OKX | Singapore

Research Engineering/ Scientist Associate I

@ The University of Texas at Austin | AUSTIN, TX

Senior Data Engineer

@ Algolia | London, England

Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)

@ BlackRock | NY7 - 50 Hudson Yards, New York

Snowflake Data Analytics

@ Devoteam | Madrid, Spain