Web: http://arxiv.org/abs/2201.09997

Jan. 26, 2022, 2:10 a.m. | Timofey Atnashev, Veronika Ganeeva, Roman Kazakov, Daria Matyash, Michael Sonkin, Ekaterina Voloshina, Oleg Serikov, Ekaterina Artemova

cs.CL updates on arXiv.org arxiv.org

The vast majority of existing datasets for Named Entity Recognition (NER) are
built primarily on news, research papers and Wikipedia with a few exceptions,
created from historical and literary texts. What is more, English is the main
source for data for further labelling. This paper aims to fill in multiple gaps
by creating a novel dataset "Razmecheno", gathered from the diary texts of the
project "Prozhito" in Russian. Our dataset is of interest for multiple research
lines: literary studies of …

arxiv digital

More from arxiv.org / cs.CL updates on arXiv.org

Senior Data Engineer

@ DAZN | Hammersmith, London, United Kingdom

Sr. Data Engineer, Growth

@ Netflix | Remote, United States

Data Engineer - Remote

@ Craft | Wrocław, Lower Silesian Voivodeship, Poland

Manager, Operations Data Science

@ Binance.US | Vancouver

Senior Machine Learning Researcher for Copilot

@ GitHub | Remote - Europe

Sr. Marketing Data Analyst

@ HoneyBook | San Francisco, CA