Feb. 26, 2024, 5:43 a.m. | Anis Koubaa, Adel Ammar, Lahouari Ghouti, Omar Najar, Serry Sibaee

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.15313v1 Announce Type: cross
Abstract: The predominance of English and Latin-based large language models (LLMs) has led to a notable deficit in native Arabic LLMs. This discrepancy is accentuated by the prevalent inclusion of English tokens in existing Arabic models, detracting from their efficacy in processing native Arabic's intricate morphology and syntax. Consequently, there is a theoretical and practical imperative for developing LLMs predominantly focused on Arabic linguistic elements. To address this gap, this paper proposes ArabianGPT, a series of …

abstract arabic arxiv cs.ai cs.cl cs.lg deficit english gpt inclusion language language models large language large language models llms processing syntax tokens type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA