all AI news
Language Imbalance Can Boost Cross-lingual Generalisation
April 12, 2024, 4:42 a.m. | Anton Sch\"afer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag
cs.LG updates on arXiv.org arxiv.org
Abstract: Multilinguality is crucial for extending recent advancements in language modelling to diverse linguistic communities. To maintain high performance while representing multiple languages, multilingual models ideally align representations, allowing what is learned in one language to generalise to others. Prior research has emphasised the importance of parallel data and shared vocabulary elements as key factors for such alignment. In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance. In controlled experiments on …
abstract arxiv boost communities cross-lingual cs.cl cs.lg data diverse importance language language modelling languages modelling multilingual multilingual models multiple performance prior research type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Software Engineer, Generative AI (C++)
@ SoundHound Inc. | Toronto, Canada