Web: http://arxiv.org/abs/2205.02670

May 6, 2022, 1:11 a.m. | Wei Wei, Huang Hengguan, Gu Xiangming, Wang Hao, Wang Ye

cs.LG updates on arXiv.org arxiv.org

Content mismatch usually occurs when data from one modality is translated to
another, e.g. language learners producing mispronunciations (errors in speech)
when reading a sentence (target text) aloud. However, most existing alignment
algorithms assume the content involved in the two modalities is perfectly
matched and thus leading to difficulty in locating such mismatch between speech
and text. In this work, we develop an unsupervised learning algorithm that can
infer the relationship between content-mismatched cross-modal sequential data,
especially for speech-text sequences. …

arxiv cross data localization unsupervised

