Aug. 23, 2022, 1:14 a.m. | David Ifeoluwa Adelani, Jesujoba Oluwadara Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter, Dietrich Klakow, Peter Nabende, E

cs.CL updates on arXiv.org arxiv.org

Recent advances in the pre-training of language models leverage large-scale
datasets to create multilingual models. However, low-resource languages are
mostly left out in these datasets. This is primarily because many widely spoken
languages are not well represented on the web and therefore excluded from the
large-scale crawls used to create datasets. Furthermore, downstream users of
these models are restricted to the selection of languages originally chosen for
pre-training. This work investigates how to optimally leverage existing
pre-trained models to create …

arxiv go news pre-trained models translation

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US