April 19, 2024, 1:21 p.m. | /u/grudev

Machine Learning www.reddit.com

I'm writing a proof of concept for a RAG application for hundreds of thousands of textual records stored in a Postgres DB, using pgvector to store embeddings ( and using an HNSW index).
Vector dimensions are specified correctly.

Currently running experiments using varied chunk sizes for the text and comparing two different embedding models.
(actual chunk size can vary a little because I am not breaking words to force a size).

- nomic-embed-text
- snowflake-arctic-embed-m-long

Here's the gist experiment:

1- …

application concept dimensions embeddings hnsw index machinelearning noise pgvector postgres rag records running search solve store textual vector writing

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York