all AI news
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model
April 4, 2024, 4:47 a.m. | Osvaldo Luamba Quinjica, David Ifeoluwa Adelani
cs.CL updates on arXiv.org arxiv.org
Abstract: In recent years, the development of pre-trained language models (PLMs) has gained momentum, showcasing their capacity to transcend linguistic barriers and facilitate knowledge transfer across diverse languages. However, this progress has predominantly bypassed the inclusion of very-low resource languages, creating a notable void in the multilingual landscape. This paper addresses this gap by introducing four tailored PLMs specifically finetuned for Angolan languages, employing a Multilingual Adaptive Fine-tuning (MAFT) approach. In this paper, we survey the …
abstract arxiv capacity cs.ai cs.cl data development diverse embedding however inclusion knowledge language language model language models languages low progress synthetic synthetic data transfer type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Data Analyst
@ Alstom | Johannesburg, GT, ZA