all AI news
[Discussion] Length Generalizability of Transformers.
Dec. 18, 2023, 4 a.m. | /u/theknightmoon
Machine Learning www.reddit.com
While reading some papers I realized that Transformers aren't length generalizable. I understand that ML models usually underperform when tested on a distribution that doesn't match the training dataset. However, some papers suggest that positional encoding (PE) is part of the reason as well, especially absolute PE (APE) doesn't work well in this case whereas relative PE helps somewhat.
My doubt is: APE is a deterministic function, ie there is nothing to learn from it. It just adds to …
case dataset distribution encoding machinelearning match ml models papers part positional encoding reading reason training transformers work
More from www.reddit.com / Machine Learning
[D] software to design figures
21 hours ago |
www.reddit.com
[R] HGRN2: Gated Linear RNNs with State Expansion
1 day, 2 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne