April 2, 2024, 7:43 p.m. | Jakub Piskorski, Micha{\l} Marci\'nczuk, Roman Yangarber

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.00482v1 Announce Type: cross
Abstract: This paper presents a corpus manually annotated with named entities for six Slavic languages - Bulgarian, Czech, Polish, Slovenian, Russian, and Ukrainian. This work is the result of a series of shared tasks, conducted in 2017-2023 as a part of the Workshops on Slavic Natural Language Processing. The corpus consists of 5 017 documents on seven topics. The documents are annotated with five classes of named entities. Each entity is described by a category, a …

