all AI news
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
March 12, 2024, 4:50 a.m. | Sara Atito, Muhammad Awais, Wenwu Wang, Mark D Plumbley, Josef Kittler
cs.CV updates on arXiv.org arxiv.org
Abstract: Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships. Constrained by the data hungry nature of transformers and the limited amount of labelled data, most transformer-based models for audio tasks are finetuned from ImageNet pretrained models, despite the huge gap between the domain of natural images and audio. This has motivated the research in self-supervised …
abstract arxiv audio classification communities computer computer vision cs.cv cs.sd data eess.as event flexibility generated global language language processing natural natural language natural language processing nature processing relationships spectrogram transformer transformers type vision
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
1 day, 18 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Director, Clinical Data Science
@ Aura | Remote USA
Research Scientist, AI (PhD)
@ Meta | Menlo Park, CA | New York City