all AI news
Length Generalization of Causal Transformers without Position Encoding
April 19, 2024, 4:47 a.m. | Jie Wang, Tao Ji, Yuanbin Wu, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang, Xiaoling Wang
cs.CL updates on arXiv.org arxiv.org
Abstract: Generalizing to longer sentences is important for recent Transformer-based language models. Besides algorithms manipulating explicit position features, the success of Transformers without position encodings (NoPE) provides a new way to overcome the challenge. In this paper, we study the length generalization property of NoPE. We find that although NoPE can extend to longer sequences than the commonly used explicit position encodings, it still has a limited context length. We identify a connection between the failure …
abstract algorithms arxiv causal challenge cs.cl encoding features language language models paper property study success transformer transformers type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Scientist
@ Publicis Groupe | New York City, United States
Bigdata Cloud Developer - Spark - Assistant Manager
@ State Street | Hyderabad, India