all AI news
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair
April 19, 2024, 4:42 a.m. | Yusuke Sakai, Mana Makinae, Hidetaka Kamigaito, Taro Watanabe
cs.LG updates on arXiv.org arxiv.org
Abstract: In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems. However, it is very challenging to curate such a corpus due to limitations in the abilities of annotators, and hence, existing SI corpora are limited. Therefore, we propose a method to convert existing speech translation corpora into interpretation-style data, maintaining the original word order and preserving the entire source content using Large …
arxiv construction cs.ai cs.cl cs.lg cs.sd eess.as interpretation language language models large language large language models type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South