April 23, 2024, 4:43 a.m. | Alexander Shan, John Bauer, Riley Carlson, Christopher Manning

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.13465v1 Announce Type: cross
Abstract: The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset, the Worldwide English NER Dataset, to analyze NER model performance on low-resource English variants from around the world. We test widely used NER toolkits and transformer …

abstract arxiv cs.cl cs.lg data datasets english global ner popular recognition test type vast work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne