April 22, 2024, 7:46 p.m. | ahgsql

DEV Community dev.to

Challenges with Turkish Text in Vector Search


In one of my projects involving Turkish text data, I encountered the well-known difficulty of working with non-English languages in vector search (semantic search) processes. Despite trying various embedding methods, including OpenAI and Cohere Multilingual (V3), the results were consistently unsatisfactory.



The issues I faced were twofold: either the search results lacked accuracy and relevance, or I had to retrieve an excessive number of documents (more than 15) to achieve satisfactory results. This …

ai building challenges cohere data embedding english guide javascript langchain languages multilingual openai practical processes projects rag results retrieval search semantic text vector vector search

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States