Web: http://arxiv.org/abs/2201.10881

Jan. 27, 2022, 2:10 a.m. | Per Erik Solberg, Pablo Ortiz

cs.CL updates on arXiv.org arxiv.org

The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with
recordings of meetings from Stortinget, the Norwegian parliament. It is the
first, publicly available dataset containing unscripted, Norwegian speech
designed for training of automatic speech recognition (ASR) systems. The
recordings are manually transcribed and annotated with language codes and
speakers, and there are detailed metadata about the speakers. The
transcriptions exist in both normalized and non-normalized form, and
non-standardized words are explicitly marked and annotated with standardized
equivalents. To …

