all AI news
ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation. (arXiv:2211.13189v1 [cs.SD])
cs.CV updates on arXiv.org arxiv.org
Vision transformers, which were originally developed for natural language
processing, have recently generated significant interest in the computer vision
and audio communities due to their flexibility in learning long-range
relationships. Constrained by data hungry nature of transformers and limited
labelled data most transformer-based models for audio tasks are finetuned from
ImageNet pretrained models, despite the huge gap between the natural images
domain and audio domain. This has motivated the research in self-supervised
pretraining of audio transformers, which reduces the dependency …
arxiv audio general representation spectrogram transformer vision