Feb. 6, 2024, 5:54 a.m. | Alexandra Saliba Yuanchao Li Ramon Sanabria Catherine Lai

The efficacy of self-supervised speech models has been validated, yet the optimal utilization of their representations remains challenging across diverse tasks. In this study, we delve into Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks. AWEs have previously shown utility in capturing acoustic discriminability. In light of this, we propose measuring layer-wise similarity between AWEs and word embeddings, aiming to further investigate the inherent context within AWEs. Moreover, we evaluate …

advantages analysis continuous cs.cl cs.sd diverse eess.as embeddings emotion explore feature layer recognition specific tasks speech speech emotion study tasks wise word word embeddings

