April 19, 2024, 1:21 p.m. | /u/grudev

Machine Learning www.reddit.com

I'm writing a proof of concept for a RAG application for hundreds of thousands of textual records stored in a Postgres DB, using pgvector to store embeddings ( and using an HNSW index).
Vector dimensions are specified correctly.

Currently running experiments using varied chunk sizes for the text and comparing two different embedding models.
(actual chunk size can vary a little because I am not breaking words to force a size).

- nomic-embed-text
- snowflake-arctic-embed-m-long

Here's the gist experiment:

1- …

application concept dimensions embeddings hnsw index machinelearning noise pgvector postgres rag records running search solve store textual vector writing

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States