all AI news
Phonetic Segmentation of the UCLA Phonetics Lab Archive
March 29, 2024, 4:48 a.m. | Eleanor Chodroff, Bla\v{z} Pa\v{z}on, Annie Baker, Steven Moran
cs.CL updates on arXiv.org arxiv.org
Abstract: Research in speech technologies and comparative linguistics depends on access to diverse and accessible speech data. The UCLA Phonetics Lab Archive is one of the earliest multilingual speech corpora, with long-form audio recordings and phonetic transcriptions for 314 languages (Ladefoged et al., 2009). Recently, 95 of these languages were time-aligned with word-level phonetic transcriptions (Li et al., 2021). Here we present VoxAngeles, a corpus of audited phonetic transcriptions and phone-level alignments of the UCLA Phonetics …
abstract arxiv audio audio recordings cs.cl cs.sd data diverse eess.as form lab languages linguistics multilingual research segmentation speech technologies type ucla
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120