all AI news
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis. (arXiv:2306.09417v3 [eess.AS] UPDATED)
cs.LG updates on arXiv.org arxiv.org
With read-aloud speech synthesis achieving high naturalness scores, there is
a growing research interest in synthesising spontaneous speech. However, human
spontaneous face-to-face conversation has both spoken and non-verbal aspects
(here, co-speech gestures). Only recently has research begun to explore the
benefits of jointly synthesising these two modalities in a single system. The
previous state of the art used non-probabilistic methods, which fail to capture
the variability of human speech and motion, and risk producing oversmoothing
artefacts and sub-optimal synthesis quality. …
arxiv benefits conversation denoising diff explore face human research speech spoken synthesis verbal