all AI news
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation. (arXiv:2210.15398v2 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
Data augmentation is a technique to generate new training data based on
existing data. We evaluate the simple and cost-effective method of
concatenating the original data examples to build new training instances.
Continued training with such augmented data is able to improve off-the-shelf
Transformer and Conformer models that were optimized on the original data only.
We demonstrate considerable improvements on the LibriSpeech-960h test sets (WER
2.83 and 6.87 for test-clean and test-other), which carry over to models
combined with shallow …
arxiv augmentation augmented data automatic speech recognition cost data examples fusion instances recognition speech speech recognition test training training data transformer translation