Nov. 4, 2022, 1:16 a.m. | Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha

cs.CL updates on arXiv.org arxiv.org

We present Multiscale Audio Spectrogram Transformer (MAST) for audio
classification, which brings the concept of multiscale feature hierarchies to
the Audio Spectrogram Transformer (AST). Given an input audio spectrogram we
first patchify and project it into an initial temporal resolution and embedding
dimension, post which the multiple stages in MAST progressively expand the
embedding dimension while reducing the temporal resolution of the input. We use
a pyramid structure that allows early layers of MAST operating at a high
temporal resolution …

arxiv audio spectrogram transformers

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA