April 29, 2024, 2:30 p.m. | /u/kiockete

Machine Learning www.reddit.com

If both RoPE and ALiBi work under the assumption that we should assign increasingly lower scores the further apart two tokens are, wouldn't the score be so penalized at some point that even if there is an interesting fact 1 million tokens away, we couldn't retrieve it because the positional encoding would force it to have such a low score?

attention attention mechanisms information machinelearning rope tokens work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US