Feb. 26, 2024, 5:43 a.m. | Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.15089v1 Announce Type: cross
Abstract: Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence. However, evaluating the answer's attribution, i.e., whether every claim within the generated responses is fully supported by its cited evidence, remains an open problem. This verification, traditionally dependent on costly human evaluation, underscores the urgent need for automatic attribution evaluation methods. To bridge the gap in the absence of standardized benchmarks for these methods, we present AttributionBench, a …

abstract arxiv attribution claim cs.ai cs.cl cs.lg evaluation every evidence generated generative generative search human language language model large language large language model llm modern reliability responses search type verification

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada