Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition | allainews.com

April 2, 2024, 7:51 p.m. | Saied Alshahrani, Hesham Haroon, Ali Elfilali, Mariama Njie, Jeanna Matthews

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.00565v1 Announce Type: new
Abstract: Wikipedia articles (content pages) are commonly used corpora in Natural Language Processing (NLP) research, especially in low-resource languages other than English. Yet, a few research studies have studied the three Arabic Wikipedia editions, Arabic Wikipedia (AR), Egyptian Arabic Wikipedia (ARZ), and Moroccan Arabic Wikipedia (ARY), and documented issues in the Egyptian Arabic Wikipedia edition regarding the massive automatic creation of its articles using template-based translation from English to Arabic without human involvement, overwhelming the Egyptian …

abstract arabic articles arxiv case case study cs.cl english exploratory language language processing languages low metadata natural natural language natural language processing nlp processing research studies study template translation type wikipedia

More from arxiv.org / cs.CL updates on arXiv.org

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring 21 hours ago | arxiv.org

abstract arxiv biases concerns +24

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks 21 hours ago | arxiv.org

arxiv building cs.ai cs.cl +5

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark 21 hours ago | arxiv.org

abstract analysis arxiv basic +28

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces 21 hours ago | arxiv.org

abstract ancestry arxiv authors +21

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification 21 hours ago | arxiv.org

arxiv classification cs.cl cs.cv +9

Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations 21 hours ago | arxiv.org

abstract art arxiv clip +17

Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Trafficking Text Corpora 21 hours ago | arxiv.org

abstract arxiv change climate +19

Large Language Models Show Human-like Social Desirability Biases in Survey Responses 21 hours ago | arxiv.org

abstract arxiv become behavior +25

Linearizing Large Language Models 21 hours ago | arxiv.org

arxiv cs.cl language language models +3

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net