Feb. 13, 2024, 5:43 a.m. | Kenichi Fujita Atsushi Ando Yusuke Ijima

cs.LG updates on arXiv.org arxiv.org

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted from phonemes and their durations, which are known to be related to speaking rhythm. They are extracted with a speaker identification …

cs.cl cs.lg cs.sd eess.as embeddings extraction features paper speaker speech synthesis

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analyst-Data Analysis

@ Tesco Bengaluru | Bengaluru, India

Data Engineer - Senior Associate

@ PwC | Brussels

People Data Analyst

@ Version 1 | London, United Kingdom

Senior Data Scientist

@ Palta | Simple Cyprus or remote