all AI news
Content-Context Factorized Representations for Automated Speech Recognition. (arXiv:2205.09872v1 [eess.AS])
cs.LG updates on arXiv.org arxiv.org
Deep neural networks have largely demonstrated their ability to perform
automated speech recognition (ASR) by extracting meaningful features from input
audio frames. Such features, however, may consist not only of information about
the spoken language content, but also may contain information about unnecessary
contexts such as background noise and sounds or speaker identity, accent, or
protected attributes. Such information can directly harm generalization
performance, by introducing spurious correlations between the spoken words and
the context in which such words were …
arxiv automated speech recognition context speech speech recognition