March 12, 2024, 4:51 a.m. | Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.06487v1 Announce Type: new
Abstract: This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual …

abstract application arxiv attention cs.cl cs.sd data dialogue dynamic eess.as english japanese multilingual paper prediction predictive projection spoken transformer type voice

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Global Data Architect, AVP - State Street Global Advisors

@ State Street | Boston, Massachusetts

Data Engineer

@ NTT DATA | Pune, MH, IN