all AI news
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation. (arXiv:2201.08054v1 [cs.CL])
Jan. 21, 2022, 2:10 a.m. | Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu, Sadao Kurohashi
cs.CL updates on arXiv.org arxiv.org
Existing multimodal machine translation (MMT) datasets consist of images and
video captions or general subtitles, which rarely contain linguistic ambiguity,
making visual information not so effective to generate appropriate
translations. We introduce VISA, a new dataset that consists of 40k
Japanese-English parallel sentence pairs and corresponding video clips with the
following key features: (1) the parallel sentences are subtitles from movies
and TV episodes; (2) the source subtitles are ambiguous, which means they have
multiple possible translations with different meanings; …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Future Opportunity: Managed Services, Data Analyst
@ project44 | Poland - Kraków
Staff Software Engineer, Data Migration
@ Okta | Spain
Data Engineer
@ Red Bull | Thalgau, Austria
Head of Artificial Intelligence & Automation Transformation
@ Guardian | New York
Data Scientist-1
@ Visa | Bengaluru, India