April 3, 2024, 4:46 a.m. | Frank Palma Gomez, Ramon Sanabria, Yun-hsuan Sung, Daniel Cer, Siddharth Dalmia, Gustavo Hernandez Abrego

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.01616v1 Announce Type: new
Abstract: Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi-modal DE retrieval systems. Unlike traditional methods, our system doesn't require …

abstract arxiv beyond cross-lingual cs.cl cs.ir cs.sd data documents eess.as embedding encoder language language models languages large language large language models llms modal project queries retrieval space speech success systems text type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant - Artificial Intelligence & Data (Google Cloud Data Engineer) - MY / TH

@ Deloitte | Kuala Lumpur, MY