AttributionBench: How Hard is Automatic Attribution Evaluation? | allainews.com

Feb. 26, 2024, 5:43 a.m. | Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.15089v1 Announce Type: cross
Abstract: Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence. However, evaluating the answer's attribution, i.e., whether every claim within the generated responses is fully supported by its cited evidence, remains an open problem. This verification, traditionally dependent on costly human evaluation, underscores the urgent need for automatic attribution evaluation methods. To bridge the gap in the absence of standardized benchmarks for these methods, we present AttributionBench, a …

abstract arxiv attribution claim cs.ai cs.cl cs.lg evaluation every evidence generated generative generative search human language language model large language large language model llm modern reliability responses search type verification

More from arxiv.org / cs.LG updates on arXiv.org

Course Recommender Systems Need to Consider the Job Market 1 day ago | arxiv.org

abstract arxiv course cs.ir +16

$\texttt{immrax}$: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX 1 day ago | arxiv.org

abstract analysis arxiv compilation +18

Thousands of AI Authors on the Future of AI 1 day ago | arxiv.org

abstract advanced advanced ai ai progress +21

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 1 day ago | arxiv.org

abstract analysis arxiv assessment +24

Volume-Preserving Transformers for Learning Time Series Data with Structure 1 day ago | arxiv.org

abstract arxiv cs.lg cs.na +24

Eureka: Human-Level Reward Design via Coding Large Language Models 1 day ago | arxiv.org

abstract algorithm arxiv bridge +25

Reconstruction of Unstable Heavy Particles Using Deep Symmetry-Preserving Attention Networks 1 day ago | arxiv.org

abstract arxiv attention cs.lg +11

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search 1 day ago | arxiv.org

abstract arxiv become compression +24

Gaussian random field approximation via Stein's method with applications to wide random neural networks 1 day ago | arxiv.org

abstract applications approximation arxiv +14

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada

View on ai-jobs.net