April 22, 2024, 7:46 p.m. | ahgsql

DEV Community dev.to

Challenges with Turkish Text in Vector Search


In one of my projects involving Turkish text data, I encountered the well-known difficulty of working with non-English languages in vector search (semantic search) processes. Despite trying various embedding methods, including OpenAI and Cohere Multilingual (V3), the results were consistently unsatisfactory.



The issues I faced were twofold: either the search results lacked accuracy and relevance, or I had to retrieve an excessive number of documents (more than 15) to achieve satisfactory results. This …

ai building challenges cohere data embedding english guide javascript langchain languages multilingual openai practical processes projects rag results retrieval search semantic text vector vector search

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US