Dec. 18, 2023, 4 a.m. | /u/theknightmoon

Machine Learning www.reddit.com

Hi,

While reading some papers I realized that Transformers aren't length generalizable. I understand that ML models usually underperform when tested on a distribution that doesn't match the training dataset. However, some papers suggest that positional encoding (PE) is part of the reason as well, especially absolute PE (APE) doesn't work well in this case whereas relative PE helps somewhat.

My doubt is: APE is a deterministic function, ie there is nothing to learn from it. It just adds to …

case dataset distribution encoding machinelearning match ml models papers part positional encoding reading reason training transformers work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne