July 1, 2022, 1:10 a.m. | Kyle Kastner, Aaron Courville

cs.LG updates on arXiv.org arxiv.org

This paper introduces R-MelNet, a two-part autoregressive architecture with a
frontend based on the first tier of MelNet and a backend WaveRNN-style audio
decoder for neural text-to-speech synthesis. Taking as input a mixed sequence
of characters and phonemes, with an optional audio priming sequence, this model
produces low-resolution mel-spectral features which are interpolated and used
by a WaveRNN decoder to produce an audio waveform. Coupled with half precision
training, R-MelNet uses under 11 gigabytes of GPU memory on a single …

arxiv modeling tts

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Data Engineer

@ Bosch Group | San Luis Potosí, Mexico

DATA Engineer (H/F)

@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)

Advisor, Data engineering

@ Desjardins | 1, Complexe Desjardins, Montréal

Data Engineer Intern

@ Getinge | Wayne, NJ, US

Software Engineer III- Java / Python / Pyspark / ETL

@ JPMorgan Chase & Co. | Jersey City, NJ, United States