all AI news
MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation
March 27, 2024, 4:42 a.m. | Weiguo Gao
cs.LG updates on arXiv.org arxiv.org
Abstract: When the predicted sequence length exceeds the length seen during training, the transformer's inference accuracy diminishes. Existing relative position encoding methods, such as those based on the ALiBi technique, address the length extrapolation challenge exclusively through the implementation of a single kernel function, which introduces a constant bias to every post-softmax attention scores according to their distance. These approaches do not investigate or employ multiple kernel functions to address the extrapolation challenge. Drawing on the …
abstract accuracy arxiv challenge cs.ai cs.lg encoding function implementation inference kernel multiple positional encoding through training transformer type
More from arxiv.org / cs.LG updates on arXiv.org
Digital Over-the-Air Federated Learning in Multi-Antenna Systems
2 days, 13 hours ago |
arxiv.org
Bagging Provides Assumption-free Stability
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
RL Analytics - Content, Data Science Manager
@ Meta | Burlingame, CA
Research Engineer
@ BASF | Houston, TX, US, 77079