all AI news
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Feb. 23, 2024, 5:48 a.m. | Xiuying Chen, Tairan Wang, Qingqing Zhu, Taicheng Guo, Shen Gao, Zhiyong Lu, Xin Gao, Xiangliang Zhang
cs.CL updates on arXiv.org arxiv.org
Abstract: The summarization capabilities of pretrained and large language models (LLMs) have been widely validated in general areas, but their use in scientific corpus, which involves complex sentences and specialized knowledge, has been less assessed. This paper presents conceptual and experimental analyses of scientific summarization, highlighting the inadequacies of traditional evaluation methods, such as $n$-gram, embedding comparison, and QA, particularly in providing explanations, grasping scientific concepts, or identifying key content. Subsequently, we introduce the Facet-aware Metric …
abstract arxiv benchmark capabilities cs.cl evaluation experimental facet general highlighting knowledge language language models large language large language models llms metrics paper summarization type
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
1 day, 10 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
1 day, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Applied Scientist
@ Microsoft | Redmond, Washington, United States
Data Analyst / Action Officer
@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States