all AI news
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon. (arXiv:2206.11332v1 [cs.CL])
Web: http://arxiv.org/abs/2206.11332
June 24, 2022, 1:12 a.m. | Robin Algayres, Tristan Ricoul, Julien Karadayi, Hugo Laurençon, Salah Zaiem, Abdelrahman Mohamed, Benoît Sagot, Emmanuel Dupoux
cs.CL updates on arXiv.org arxiv.org
Finding word boundaries in continuous speech is challenging as there is
little or no equivalent of a 'space' delimiter between words. Popular Bayesian
non-parametric models for text segmentation use a Dirichlet process to jointly
segment sentences and build a lexicon of word types. We introduce DP-Parse,
which uses similar principles but only relies on an instance lexicon of word
tokens, avoiding the clustering errors that arise with a lexicon of word types.
On the Zero Resource Speech Benchmark 2017, our …
More from arxiv.org / cs.CL updates on arXiv.org
Latest AI/ML/Big Data Jobs
Machine Learning Researcher - Saalfeld Lab
@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia
Project Director, Machine Learning in US Health
@ ideas42.org | Remote, US
Data Science Intern
@ NannyML | Remote
Machine Learning Engineer NLP/Speech
@ Play.ht | Remote
Research Scientist, 3D Reconstruction
@ Yembo | Remote, US
Clinical Assistant or Associate Professor of Management Science and Systems
@ University at Buffalo | Buffalo, NY