April 26, 2024, 4:47 a.m. | Kangwook Jang, Sungnyun Kim, Hoirin Kim

cs.CL updates on arXiv.org arxiv.org

arXiv:2312.09040v2 Announce Type: replace-cross
Abstract: Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize. In this study, we propose to compress the speech SSL models by distilling speech temporal relation (STaR). Unlike previous works that directly match the representation for each speech frame, STaR distillation transfers temporal relation between speech frames, which is more suitable for lightweight student with limited capacity. We explore three STaR distillation objectives …

abstract arxiv computational cost cs.cl cs.sd eess.as performance self-supervised learning speech ssl star study supervised learning temporal them transformer type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Machine Learning Research Scientist

@ d-Matrix | San Diego, Ca