Nov. 24, 2022, 7:17 a.m. | Sara Atito, Muhammad Awais, Wenwu Wang, Mark D Plumbley, Josef Kittler

cs.CV updates on arXiv.org arxiv.org

Vision transformers, which were originally developed for natural language
processing, have recently generated significant interest in the computer vision
and audio communities due to their flexibility in learning long-range
relationships. Constrained by data hungry nature of transformers and limited
labelled data most transformer-based models for audio tasks are finetuned from
ImageNet pretrained models, despite the huge gap between the natural images
domain and audio domain. This has motivated the research in self-supervised
pretraining of audio transformers, which reduces the dependency …

arxiv audio general representation spectrogram transformer vision

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston