March 14, 2024, 4:42 a.m. | Ziqi Liang, Haoxiang Shi, Jiawei Wang, Keda Lu

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.08164v1 Announce Type: cross
Abstract: Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techniques can significantly reduce the parameters and training time of a TTS model while guaranteeing a certain performance due to their …

abstract arxiv become components cs.lg cs.sd data deep learning eess.as gpu however low modeling networks neural networks performance quality recurrent neural networks results rnn speech standard synthesis systems text text-to-speech training tts type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne