Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References | allainews.com

April 4, 2024, 4:47 a.m. | Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Tom Kocmi, Furu Wei

cs.CL updates on arXiv.org arxiv.org

arXiv:2305.15067v2 Announce Type: replace
Abstract: Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements. The underlying reason is that one semantic meaning can actually be expressed in different forms, and the evaluation with a single or few references may not accurately reflect the quality of the model's hypotheses. To address this issue, this paper presents a simple and effective method, named Div-Ref, to …

arxiv cs.cl evaluation improving metrics nlg type

More from arxiv.org / cs.CL updates on arXiv.org

Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding 11 hours ago | arxiv.org

abstract alternative arxiv bayes +17

Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models 11 hours ago | arxiv.org

abstract advances architectures arxiv +21

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models 11 hours ago | arxiv.org

abstract arxiv audio audio generation +26

Model-Based Minimum Bayes Risk Decoding for Text Generation 11 hours ago | arxiv.org

abstract alternative arxiv bayes +15

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models 11 hours ago | arxiv.org

abstract arxiv cond-mat.mtrl-sci cs.ai +28

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges 11 hours ago | arxiv.org

abstract advances arxiv challenges +21

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding 11 hours ago | arxiv.org

abstract algorithms arxiv bayes +15

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization 11 hours ago | arxiv.org

arxiv attacks cs.cl jailbreaking +7

Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment 11 hours ago | arxiv.org

abstract alignment arxiv bias +27

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

Software Engineer III -Full Stack Developer - ModelOps, MLOps

@ JPMorgan Chase & Co. | NY, United States

View on ai-jobs.net

Senior Lead Software Engineer - Full Stack Senior Developer - ModelOps, MLOps

@ JPMorgan Chase & Co. | NY, United States

View on ai-jobs.net

Software Engineer III - Full Stack Developer - ModelOps, MLOps

@ JPMorgan Chase & Co. | NY, United States

View on ai-jobs.net

Research Scientist (m/w/d) - Numerische Simulation Laser-Materie-Wechselwirkung

@ Fraunhofer-Gesellschaft | Freiburg, DE, 79104

View on ai-jobs.net

Research Scientist, Speech Real-Time Dialog

@ Google | Mountain View, CA, USA

View on ai-jobs.net