May 14, 2024, 4:49 a.m. | Cagri Toraman

cs.CL updates on arXiv.org arxiv.org

arXiv:2405.07745v1 Announce Type: new
Abstract: Despite advancements in English-dominant generative large language models, further development is needed for low-resource languages to enhance global accessibility. The primary methods for representing these languages are monolingual and multilingual pretraining. Monolingual pretraining is expensive due to hardware requirements, and multilingual models often have uneven performance across languages. This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages. We assess various strategies, including continual training, instruction fine-tuning, …

abstract accessibility arxiv cs.ai cs.cl development english generative global hardware language language models languages large language large language models low multilingual multilingual models pretraining requirements type

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Sr Business Intelligence Analyst

@ T. Rowe Price | Baltimore, MD

Business Intelligence Analyst, Market Insights and Analytics

@ Morningstar | Mumbai

Senior Back-End Developer - Generative AI

@ Aptiv | POL Krakow - Eng

System Architect (Document AI)

@ Trafigura | London - Traf Office