March 21, 2022, 1:11 a.m. | A. Stevie Bergman, Mona T. Diab

cs.CL updates on arXiv.org arxiv.org

When building NLP models, there is a tendency to aim for broader coverage,
often overlooking cultural and (socio)linguistic nuance. In this position
paper, we make the case for care and attention to such nuances, particularly in
dataset annotation, as well as the inclusion of cultural and linguistic
expertise in the process. We present a playbook for responsible dataset
creation for polyglossic, multidialectal languages. This work is informed by a
study on Arabic annotation of social media content.

annotation arxiv language natural natural language

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Strategy & Management - Private Equity Sector - Manager - Consulting - Location OPEN

@ EY | New York City, US, 10001-8604

Data Engineer- People Analytics

@ Volvo Group | Gothenburg, SE, 40531