VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation. (arXiv:2201.08054v1 [cs.CL]) | allainews.com

Jan. 21, 2022, 2:10 a.m. | Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu, Sadao Kurohashi

cs.CL updates on arXiv.org arxiv.org

Existing multimodal machine translation (MMT) datasets consist of images and
video captions or general subtitles, which rarely contain linguistic ambiguity,
making visual information not so effective to generate appropriate
translations. We introduce VISA, a new dataset that consists of 40k
Japanese-English parallel sentence pairs and corresponding video clips with the
following key features: (1) the parallel sentences are subtitles from movies
and TV episodes; (2) the source subtitles are ambiguous, which means they have
multiple possible translations with different meanings; …

arxiv dataset machine machine translation translation visa

More from arxiv.org / cs.CL updates on arXiv.org

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation 14 hours ago | arxiv.org

abstract arxiv causal cs.ai +24

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval 14 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.ir +8

Visual Grounding Methods for VQA are Working for the Wrong Reasons! 14 hours ago | arxiv.org

abstract arxiv attention bias +17

Explicitly Representing Syntax Improves Sentence-to-layout Prediction of Unexpected Situations 14 hours ago | arxiv.org

abstract arxiv cs.cl image +15

PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging 14 hours ago | arxiv.org

arxiv cs.ai cs.cl fine-tuning +9

README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP 14 hours ago | arxiv.org

abstract access advancement arxiv +22

The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text 14 hours ago | arxiv.org

abstract arxiv consequences cs.cl +18

Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers 14 hours ago | arxiv.org

arxiv benchmark checkers cs.cl +3

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code 14 hours ago | arxiv.org

arxiv code cs.ai cs.cl +10

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

View on ai-jobs.net

Future Opportunity: Managed Services, Data Analyst

@ project44 | Poland - Kraków

View on ai-jobs.net

Staff Software Engineer, Data Migration

@ Okta | Spain

View on ai-jobs.net

Data Engineer

@ Red Bull | Thalgau, Austria

View on ai-jobs.net

Head of Artificial Intelligence & Automation Transformation

@ Guardian | New York

View on ai-jobs.net

Data Scientist-1

@ Visa | Bengaluru, India

View on ai-jobs.net