June 27, 2024, 4:42 a.m. | Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

cs.CL updates on arXiv.org arxiv.org

arXiv:2308.06873v2 Announce Type: replace-cross
Abstract: Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with …

arxiv codec cs.cl cs.lg cs.sd eess.as language language model replace speech transformer type

Quantitative Researcher – Algorithmic Research

@ Man Group | GB London Riverbank House

Software Engineering Expert

@ Sanofi | Budapest

Senior Bioinformatics Scientist

@ Illumina | US - Bay Area - Foster City

Senior Engineer - Generative AI Product Engineering (Remote-Eligible)

@ Capital One | McLean, VA

Graduate Assistant - Bioinformatics

@ University of Arkansas System | University of Arkansas at Little Rock

Senior AI-HPC Cluster Engineer

@ NVIDIA | US, CA, Santa Clara