Dec. 3, 2023, 1:07 p.m. | /u/graphitout

Deep Learning www.reddit.com

When I was reading about attention initially, I assumed that the positional encoding is relative (as in difference in positions of query and key). But as per the paper "Attention is all you need", the position encoding vector seems to be fixed. The paper states that:

"We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, P Epos+k can be represented as a linear …

attention attention is all you need deeplearning difference encoding function paper per positional encoding query reading transformers vector

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne