all AI news
Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information
April 25, 2024, 5:44 p.m. | Chihiro Taguchi, Jefferson Saransig, Dayana Vel\'asquez, David Chiang
cs.CL updates on arXiv.org arxiv.org
Abstract: This paper presents Killkan, the first dataset for automatic speech recognition (ASR) in the Kichwa language, an indigenous language of Ecuador. Kichwa is an extremely low-resource endangered language, and there have been no resources before Killkan for Kichwa to be incorporated in applications of natural language processing. The dataset contains approximately 4 hours of audio with transcription, translation into Spanish, and morphosyntactic annotation in the format of Universal Dependencies. The audio data was retrieved from …
abstract applications arxiv asr automatic speech recognition cs.ai cs.cl dataset ecuador information language low paper recognition resources speech speech recognition type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote