EnCodecMAE: Leveraging neural codecs for universal audio representation learning

May 22, 2024, 4:43 a.m. | Leonardo Pepino, Pablo Riera, Luciana Ferrer

cs.LG updates on arXiv.org arxiv.org

arXiv:2309.07391v2 Announce Type: replace-cross
Abstract: The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music and environmental sounds. To approach this problem, methods inspired by works on self-supervised learning for NLP, like BERT, or computer vision, like masked autoencoders (MAE), are often adapted to the audio domain. In this work, we propose masking representations of the audio signal, and training a MAE to reconstruct the …

abstract arxiv audio bert computer computer vision cs.lg cs.sd eess.as environmental foundational foundational models music nlp replace representation representation learning self-supervised learning speech supervised learning tasks type universal vision

Visit resource

More from arxiv.org / cs.LG updates on arXiv.org

Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on … 1 day, 22 hours ago | arxiv.org

abstract algebra algorithms arxiv +20

Learning to Maximize Gains From Trade in Small Markets 1 day, 22 hours ago | arxiv.org

abstract arxiv balance budget +18

Predicting and Interpreting Energy Barriers of Metallic Glasses with Graph Neural Networks 1 day, 22 hours ago | arxiv.org

abstract arxiv challenge cond-mat.dis-nn +20

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study 1 day, 22 hours ago | arxiv.org

abstract arxiv autonomous autonomous vehicles +19

GLIMPSE: Generalized Local Imaging with MLPs 1 day, 22 hours ago | arxiv.org

abstract art arxiv cnn +22

WWW: What, When, Where to Compute-in-Memory 1 day, 22 hours ago | arxiv.org

abstract architecture arxiv compute +20

Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following 1 day, 22 hours ago | arxiv.org

abstract arxiv cs.lg cs.ro +16

Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak 1 day, 22 hours ago | arxiv.org

abstract applications arxiv cameras +26

Measuring and Mitigating Biases in Motor Insurance Pricing 1 day, 22 hours ago | arxiv.org

abstract arxiv biases construct +17

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Professor/Associate Professor of Health Informatics [LKCMedicine]

@ Nanyang Technological University | NTU Novena Campus, Singapore

View on ai-jobs.net

Research Fellow (Computer Science (and Engineering)/Electronic Engineering/Applied Mathematics/Perception Sciences)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

Java Developer - Assistant Manager

@ State Street | Bengaluru, India

View on ai-jobs.net

Senior Java/Python Developer

@ General Motors | Austin IT Innovation Center North - Austin IT Innovation Center North

View on ai-jobs.net

Research Associate (Computer Engineering/Computer Science/Electronics Engineering)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

all AI news

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

More from arxiv.org / cs.LG updates on arXiv.org

Jobs in AI, ML, Big Data

Senior Data Engineer

Professor/Associate Professor of Health Informatics [LKCMedicine]

Research Fellow (Computer Science (and Engineering)/Electronic Engineering/Applied Mathematics/Perception Sciences)

Java Developer - Assistant Manager

Senior Java/Python Developer

Research Associate (Computer Engineering/Computer Science/Electronics Engineering)