all AI news
When your Cousin has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages
March 26, 2024, 4:51 a.m. | Niyati Bafna, Cristina Espa\~na-Bonet, Josef van Genabith, Beno\^it Sagot, Rachel Bawden
cs.CL updates on arXiv.org arxiv.org
Abstract: Most existing approaches for unsupervised bilingual lexicon induction (BLI) depend on good quality static or contextual embeddings requiring large monolingual corpora for both languages. However, unsupervised BLI is most likely to be useful for low-resource languages (LRLs), where large datasets are not available. Often we are interested in building bilingual resources for LRLs against related high-resource languages (HRLs), resulting in severely imbalanced data settings for BLI. We first show that state-of-the-art BLI methods in the …
abstract arxiv bilingual cs.cl data datasets embeddings good however languages large datasets low quality type unsupervised
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Data Engineering Manager
@ Microsoft | Redmond, Washington, United States
Machine Learning Engineer
@ Apple | San Diego, California, United States