Dec. 3, 2023, 1:07 p.m. | /u/graphitout

Deep Learning www.reddit.com

When I was reading about attention initially, I assumed that the positional encoding is relative (as in difference in positions of query and key). But as per the paper "Attention is all you need", the position encoding vector seems to be fixed. The paper states that:

"We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, P Epos+k can be represented as a linear …

attention attention is all you need deeplearning difference encoding function paper per positional encoding query reading transformers vector

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US