April 12, 2024, 4:42 a.m. | Anton Sch\"afer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.07982v1 Announce Type: cross
Abstract: Multilinguality is crucial for extending recent advancements in language modelling to diverse linguistic communities. To maintain high performance while representing multiple languages, multilingual models ideally align representations, allowing what is learned in one language to generalise to others. Prior research has emphasised the importance of parallel data and shared vocabulary elements as key factors for such alignment. In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance. In controlled experiments on …

abstract arxiv boost communities cross-lingual cs.cl cs.lg data diverse importance language language modelling languages modelling multilingual multilingual models multiple performance prior research type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US