Nov. 20, 2023, 7:36 a.m. | Sana Hassan

MarkTechPost www.marktechpost.com

Researchers from Google Research, Google DeepMind, and the University of Waterloo introduce SWIM-IR, a synthetic retrieval training dataset encompassing 33 languages, addressing the challenge of limited human-labeled training pairs in multilingual retrieval. Leveraging the SAP (summarize-then-ask prompting) method, SWIM-IR is constructed to enable synthetic fine-tuning of multilingual dense retrieval models without human supervision. SWIM-X models, […]


The post A New AI Research Releases SWIM-IR: A Large-Scale Synthetic Multilingual Retrieval Dataset with 28 Million Training Pairs over 33 Languages appeared first …

ai research ai shorts applications artificial intelligence challenge dataset deepmind editors pick google google deepmind google research human language model languages machine learning multilingual prompting releases research researchers retrieval sap scale staff synthetic tech news technology training university university of waterloo

More from www.marktechpost.com / MarkTechPost

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US