all AI news
mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset. (arXiv:2108.13897v5 [cs.CL] UPDATED)
Aug. 18, 2022, 1:11 a.m. | Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
cs.CL updates on arXiv.org arxiv.org
The MS MARCO ranking dataset has been widely used for training deep learning
models for IR tasks, achieving considerable effectiveness on diverse zero-shot
scenarios. However, this type of resource is scarce in languages other than
English. In this work, we present mMARCO, a multilingual version of the MS
MARCO passage ranking dataset comprising 13 languages that was created using
machine translation. We evaluated mMARCO by finetuning monolingual and
multilingual reranking models, as well as a multilingual dense retrieval model
on …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Quality, Senior Analyst
@ Toyota North America | Plano
Data Analyst & Audit Management Software (AMS) Coordinator
@ World Vision | Philippines - Home Working
Product Manager Power BI Platform Tech I&E Operational Insights
@ ING | HBP (Amsterdam - Haarlerbergpark)
Sr. Director, Software Engineering, Clinical Data Strategy
@ Moderna | USA-Washington-Seattle-1099 Stewart Street
Data Engineer (Data as a Service)
@ Xplor | Atlanta, GA, United States