Web: http://arxiv.org/abs/2112.08352

May 6, 2022, 1:12 a.m. | Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning H

cs.LG updates on arXiv.org arxiv.org

We present a textless speech-to-speech translation (S2ST) system that can
translate speech from one language into another language and can be built
without the need of any text data. Different from existing work in the
literature, we tackle the challenge in modeling multi-speaker target speech and
train the systems with real-world S2ST data. The key to our approach is a
self-supervised unit-based speech normalization technique, which finetunes a
pre-trained speech encoder with paired audios from multiple speakers and a
single …

arxiv data on speech translation

