all AI news
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
June 27, 2024, 4:42 a.m. | Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka
cs.CL updates on arXiv.org arxiv.org
Abstract: Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with …
arxiv codec cs.cl cs.lg cs.sd eess.as language language model replace speech transformer type
More from arxiv.org / cs.CL updates on arXiv.org
ReFT: Reasoning with Reinforced Fine-Tuning
1 day, 17 hours ago |
arxiv.org
Exploring Defeasibility in Causal Reasoning
1 day, 17 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Quantitative Researcher – Algorithmic Research
@ Man Group | GB London Riverbank House
Software Engineering Expert
@ Sanofi | Budapest
Senior Bioinformatics Scientist
@ Illumina | US - Bay Area - Foster City
Senior Engineer - Generative AI Product Engineering (Remote-Eligible)
@ Capital One | McLean, VA
Graduate Assistant - Bioinformatics
@ University of Arkansas System | University of Arkansas at Little Rock
Senior AI-HPC Cluster Engineer
@ NVIDIA | US, CA, Santa Clara