all AI news
Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data
March 20, 2024, 4:48 a.m. | Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank
cs.CL updates on arXiv.org arxiv.org
Abstract: Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs from standard German in lexical distribution, syntactic construction, and entity information. We conduct in-domain, cross-domain, sequential, and …
abstract articles arxiv cs.cl data dataset extract german information key ner paper recognition resources tokens tweet tweets type wikipedia
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Sr. VBI Developer II
@ Atos | Texas, US, 75093
Wealth Management - Data Analytics Intern/Co-op Fall 2024
@ Scotiabank | Toronto, ON, CA