all AI news
The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation
March 28, 2024, 4:48 a.m. | Nicolas Guerin, Shane Steinert-Threlkeld, Emmanuel Chemla
cs.CL updates on arXiv.org arxiv.org
Abstract: Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. We find, contrary to popular belief, that (i) parallel word frequency distributions, (ii) partially shared vocabulary, and (iii) similar syntactic structure across languages …
abstract artificial arxiv cs.cl fly general however impact languages machine machine translation multilingual neural machine translation pretraining semantic translation type unsupervised work
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AI Engineering Manager
@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain