Is Reference Necessary in the Evaluation of NLG Systems? When and Where? | allainews.com

March 22, 2024, 4:48 a.m. | Shuqian Sheng, Yi Xu, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.14275v1 Announce Type: new
Abstract: The majority of automatic metrics for evaluating NLG systems are reference-based. However, the challenge of collecting human annotation results in a lack of reliable references in numerous application scenarios. Despite recent advancements in reference-free metrics, it has not been well understood when and where they can be used as an alternative to reference-based metrics. In this study, by employing diverse analytical approaches, we comprehensively assess the performance of both metrics across a wide range of …

abstract annotation application arxiv challenge cs.cl evaluation free however human metrics nlg reference results systems type

More from arxiv.org / cs.CL updates on arXiv.org

Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding 18 hours ago | arxiv.org

abstract alternative arxiv bayes +17

Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models 18 hours ago | arxiv.org

abstract advances architectures arxiv +21

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models 18 hours ago | arxiv.org

abstract arxiv audio audio generation +26

Model-Based Minimum Bayes Risk Decoding for Text Generation 18 hours ago | arxiv.org

abstract alternative arxiv bayes +15

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models 18 hours ago | arxiv.org

abstract arxiv cond-mat.mtrl-sci cs.ai +28

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges 18 hours ago | arxiv.org

abstract advances arxiv challenges +21

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding 18 hours ago | arxiv.org

abstract algorithms arxiv bayes +15

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization 18 hours ago | arxiv.org

arxiv attacks cs.cl jailbreaking +7

Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment 18 hours ago | arxiv.org

abstract alignment arxiv bias +27

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

Sr. Data Operations

@ Carousell Group | West Jakarta, Indonesia

View on ai-jobs.net

Senior Analyst, Business Intelligence & Reporting

@ Deutsche Bank | Bucharest

View on ai-jobs.net

Business Intelligence Subject Matter Expert (SME) - Assistant Vice President

@ Deutsche Bank | Cary, 3000 CentreGreen Way

View on ai-jobs.net

Enterprise Business Intelligence Specialist

@ NAIC | Kansas City

View on ai-jobs.net

Senior Business Intelligence (BI) Developer - Associate

@ Deutsche Bank | Cary, 3000 CentreGreen Way

View on ai-jobs.net