Feb. 1, 2024, 10:10 p.m. | /u/adeeplearner

Deep Learning www.reddit.com

Hello,

I'm reading this tutorial on positional embedding of a transformer architecture. [https://machinelearningmastery.com/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1/](https://machinelearningmastery.com/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1/)

I don't understand the very last part of it:



>What Is the Final Output of the Positional Encoding Layer?
>
>The positional encoding layer sums the positional vector with the word encoding and outputs this matrix for the subsequent layers. The entire process is shown below.



Basically it says the token (word) embedding should be added with the positional embedding. What's the justification for that? …

deeplearning embedding encoding hello layer matrix part positional encoding process token vector word

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne