[D] Embeddings search "drowning" in a sea of noise! Can you solve this riddle? | allainews.com

April 19, 2024, 1:21 p.m. | /u/grudev

Machine Learning www.reddit.com

I'm writing a proof of concept for a RAG application for hundreds of thousands of textual records stored in a Postgres DB, using pgvector to store embeddings ( and using an HNSW index).
Vector dimensions are specified correctly.

Currently running experiments using varied chunk sizes for the text and comparing two different embedding models.
(actual chunk size can vary a little because I am not breaking words to force a size).

- nomic-embed-text
- snowflake-arctic-embed-m-long

Here's the gist experiment:

1- …

application concept dimensions embeddings hnsw index machinelearning noise pgvector postgres rag records running search solve store textual vector writing

More from www.reddit.com / Machine Learning

[D] How do unets achieve spatial consistency? 5 hours ago | www.reddit.com

convolution create denoising hair +8

[D] Impact of solar storm on QLORA + RLHF of Llama3 8B? 7 hours ago | www.reddit.com

article control current experience +13

Feeling at a loss with all these transformer models from Hugging Face in NLP "[Discussion]" 11 hours ago | www.reddit.com

classification competition essay face +14

[P] Open source library to scrape PDFs, YouTube, URLs, Presentations, etc for API-hosted vision-language models 22 hours ago | www.reddit.com

fun machinelearning

[P] LoRA from scratch implementation for LLM classifier training 1 day, 1 hour ago | www.reddit.com

classifier implementation llm lora +3

[D] Dealing with conflicting training configurations in reference works. 1 day, 2 hours ago | www.reddit.com

active learning compute detection machinelearning +7

[R] Marcus Hutter's work on Universal Artificial Intelligence 1 day, 7 hours ago | www.reddit.com

artificial artificial intelligence bayesian biography +11

[P] LLMinator: A Llama.cpp + Gradio based opensource Chatbot to run llms locally(cpu/cuda) directly from … 1 day, 9 hours ago | www.reddit.com

chatbot community context cpp +13

[D] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Edition 1 day, 10 hours ago | www.reddit.com

book keras learn machine +7

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net