Feb. 9, 2024, 5:47 a.m. | Liang Wang Nan Yang Xiaolong Huang Linjun Yang Rangan Majumder Furu Wei

cs.CL updates on arXiv.org arxiv.org

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, …

cs.cl cs.ir

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Data Engineering Director-Big Data technologies (Hadoop, Spark, Hive, Kafka)

@ Visa | Bengaluru, India

Senior Data Engineer

@ Manulife | Makati City, Manulife Philippines Head Office

GDS Consulting Senior Data Scientist 2

@ EY | Taguig, PH, 1634

IT Data Analyst Team Lead

@ Rosecrance | Rockford, Illinois, United States