Adaptive RAG: A retrieval technique to reduce LLM token cost for top-k Vector Index retrieval [R] | allainews.com

March 28, 2024, 6:55 p.m. | /u/dxtros

Machine Learning www.reddit.com

Abstract:
We demonstrate a technique which allows to dynamically adapt the number of documents in a top-k retriever RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining the same level of accuracy. We also show that the method helps explain the lineage of LLM outputs.
The reference implementation works with most models (GPT4, many local models, older GPT-3.5 turbo) and can be used with most vector databases exposing a …

abstract accuracy adapt cost documents feedback index llm machinelearning prompt question question answering rag reduce retrieval token vector

More from www.reddit.com / Machine Learning

[D] How would you diagnose these spikes in the training loss? an hour ago | www.reddit.com

loss machinelearning training training loss

[D] What are the most common and significant challenges moving your LLM (application/system) to production? 4 hours ago | www.reddit.com

application building challenges companies +10

[P] NLLB-200 Distill 350M for en-ko 11 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 19 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 1 day ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 1 day, 1 hour ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 1 day, 1 hour ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 1 day, 1 hour ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 1 day, 3 hours ago | www.reddit.com

compression educational encoding entropy +7

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

View on ai-jobs.net

Camera Perception Engineer

@ Meta | Sunnyvale, CA

View on ai-jobs.net