[D] Embeddings search "drowning" in a sea of noise! Can you solve this riddle? | allainews.com

April 19, 2024, 1:21 p.m. | /u/grudev

Machine Learning www.reddit.com

I'm writing a proof of concept for a RAG application for hundreds of thousands of textual records stored in a Postgres DB, using pgvector to store embeddings ( and using an HNSW index).
Vector dimensions are specified correctly.

Currently running experiments using varied chunk sizes for the text and comparing two different embedding models.
(actual chunk size can vary a little because I am not breaking words to force a size).

- nomic-embed-text
- snowflake-arctic-embed-m-long

Here's the gist experiment:

1- …

application concept dimensions embeddings hnsw index machinelearning noise pgvector postgres rag records running search solve store textual vector writing

More from www.reddit.com / Machine Learning

[D] Something I always think about, for top conferences like ICML, NeurIPS, CVPR,..etc. How many … 3 hours ago | www.reddit.com

conferences cvpr etc good +8

[D] Benchmark creators should release their benchmark datasets in stages 4 hours ago | www.reddit.com

benchmark benchmarks concerns data +11

[P] spRAG - Open-source RAG implementation for challenging real-world tasks 4 hours ago | www.reddit.com

core hey implementation machinelearning +7

[D] Why do juniors (undergraduates or first- to second-year PhD students) have so many papers … 9 hours ago | www.reddit.com

academic conferences etc hello +12

[D] How can I detect the text orientation using MMOCR or MMDET models? 13 hours ago | www.reddit.com

example image images issue +5

[D] Current state of Chatbot pipelines in Commercial settings? 18 hours ago | www.reddit.com

build chatbot commercial current +12

[R] Training-free Graph Neural Networks and the Power of Labels as Features 21 hours ago | www.reddit.com

features free graph graph neural networks +6

[D] Modern best coding practices for Pytorch (for research)? 1 day ago | www.reddit.com

coding config example good +14

[R] Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic … 1 day, 2 hours ago | www.reddit.com

breaking data machinelearning model collapse +3

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Senior Data Scientist

@ ITE Management | New York City, United States

View on ai-jobs.net