March 26, 2024, 4:51 a.m. | Niyati Bafna, Cristina Espa\~na-Bonet, Josef van Genabith, Beno\^it Sagot, Rachel Bawden

cs.CL updates on arXiv.org arxiv.org

arXiv:2305.14012v2 Announce Type: replace
Abstract: Most existing approaches for unsupervised bilingual lexicon induction (BLI) depend on good quality static or contextual embeddings requiring large monolingual corpora for both languages. However, unsupervised BLI is most likely to be useful for low-resource languages (LRLs), where large datasets are not available. Often we are interested in building bilingual resources for LRLs against related high-resource languages (HRLs), resulting in severely imbalanced data settings for BLI. We first show that state-of-the-art BLI methods in the …

abstract arxiv bilingual cs.cl data datasets embeddings good however languages large datasets low quality type unsupervised

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Data Engineering Manager

@ Microsoft | Redmond, Washington, United States

Machine Learning Engineer

@ Apple | San Diego, California, United States