April 19, 2024, 4:43 a.m. | /u/awinml1

Machine Learning www.reddit.com

Say I have 1000 PDF docs that I use as input to a RAG Pipeline.

I want to to evaluate different steps of the RAG pipeline so that I can measure:
- Which embedding models work better for my data?
- Which rerankers work and are they required?
- Which LLMs give the most factual and coherent answers?

How do I evaluate these steps of the pipeline?

Based on my research, I found that most frameworks require labels for both …

data documents embedding embedding models machinelearning pdf pipeline rag retrieval set work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Scientist

@ Publicis Groupe | New York City, United States

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India