April 2, 2024, 7:52 p.m. | Heng-Jui Chang, James Glass

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.09117v2 Announce Type: replace
Abstract: This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources compared to previous state-of-the-art methods while outperforming them in severely distorted speech scenarios. This paper provides detailed analyses to show how discrete units contribute to speech encoder …

abstract arxiv clustering cs.cl cs.sd data domain eess.as noise paper representation representation learning robust speaker speech spin supervision type units

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist - XR Input Perception

@ Meta | Sausalito, CA | Redmond, WA | Burlingame, CA

Sr. Data Engineer

@ Oportun | Remote - India