March 29, 2024, 4:48 a.m. | Eleanor Chodroff, Bla\v{z} Pa\v{z}on, Annie Baker, Steven Moran

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.19509v1 Announce Type: new
Abstract: Research in speech technologies and comparative linguistics depends on access to diverse and accessible speech data. The UCLA Phonetics Lab Archive is one of the earliest multilingual speech corpora, with long-form audio recordings and phonetic transcriptions for 314 languages (Ladefoged et al., 2009). Recently, 95 of these languages were time-aligned with word-level phonetic transcriptions (Li et al., 2021). Here we present VoxAngeles, a corpus of audited phonetic transcriptions and phone-level alignments of the UCLA Phonetics …

abstract arxiv audio audio recordings cs.cl cs.sd data diverse eess.as form lab languages linguistics multilingual research segmentation speech technologies type ucla

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120