all AI news
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition. (arXiv:2209.14868v1 [cs.SD])
cs.CL updates on arXiv.org arxiv.org
The recurrent neural network transducer (RNN-T) is a prominent streaming
end-to-end (E2E) ASR technology. In RNN-T, the acoustic encoder commonly
consists of stacks of LSTMs. Very recently, as an alternative to LSTM layers,
the Conformer architecture was introduced where the encoder of RNN-T is
replaced with a modified Transformer encoder composed of convolutional layers
at the frontend and between attention layers. In this paper, we introduce a new
streaming ASR model, Convolutional Augmented Recurrent Neural Network
Transducers (ConvRNN-T) in which …
arxiv network neural network recurrent neural network speech speech recognition streaming