CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models. (arXiv:2401.17043v1 [cs.CL]) | allainews.com

Jan. 31, 2024, 4:41 p.m. | Yuanjie Lyu, Zhiyu Li, Simin Niu, Feiyu Xiong, Bo Tang, Wenjin Wang, Hao Wu, Huanyong Liu, Tong Xu, Enhong Chen

cs.CL updates on arXiv.org arxiv.org

Retrieval-Augmented Generation (RAG) is a technique that enhances the
capabilities of large language models (LLMs) by incorporating external
knowledge sources. This method addresses common LLM limitations, including
outdated information and the tendency to produce inaccurate "hallucinated"
content. However, the evaluation of RAG systems is challenging, as existing
benchmarks are limited in scope and diversity. Most of the current benchmarks
predominantly assess question-answering applications, overlooking the broader
spectrum of situations where RAG could prove advantageous. Moreover, they only
evaluate the performance …

arxiv benchmark capabilities chinese crud cs.cl evaluation information knowledge language language models large language large language models limitations llm llms rag retrieval retrieval-augmented systems

More from arxiv.org / cs.CL updates on arXiv.org

Sparse is Enough in Fine-tuning Pre-trained Large Language Models 19 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

On the Learnability of Watermarks for Language Models 19 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +17

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization 19 hours ago | arxiv.org

abstract arxiv capabilities cs.ai +14

Evaluating Generative Ad Hoc Information Retrieval 19 hours ago | arxiv.org

abstract advances arxiv cs.cl +19

Language Models As Semantic Indexers 19 hours ago | arxiv.org

arxiv cs.cl cs.ir cs.lg +4

Large language models can accurately predict searcher preferences 19 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

On the Reliability of Watermarks for Large Language Models 19 hours ago | arxiv.org

abstract arxiv become bots +28

A Watermark for Large Language Models 19 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +16

CreoleVal: Multilingual Multitask Benchmarks for Creoles 19 hours ago | arxiv.org

abstract annotated data arxiv benchmarks +14

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Data Science Analyst

@ Mayo Clinic | AZ, United States

View on ai-jobs.net

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA

View on ai-jobs.net