June 5, 2024, 4:52 a.m. | Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Hui Xiong

cs.CL updates on arXiv.org arxiv.org

arXiv:2305.11096v4 Announce Type: replace
Abstract: End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. Due to these challenges, the input and output distributions of end-to-end sign language translation (i.e., video-to-text) are less effective compared to the gloss-to-text approach (i.e., text-to-text). To tackle these challenges, we propose a …

