all AI news
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
April 4, 2024, 4:47 a.m. | Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Tom Kocmi, Furu Wei
cs.CL updates on arXiv.org arxiv.org
Abstract: Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements. The underlying reason is that one semantic meaning can actually be expressed in different forms, and the evaluation with a single or few references may not accurately reflect the quality of the model's hypotheses. To address this issue, this paper presents a simple and effective method, named Div-Ref, to …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
Software Engineer III -Full Stack Developer - ModelOps, MLOps
@ JPMorgan Chase & Co. | NY, United States
Senior Lead Software Engineer - Full Stack Senior Developer - ModelOps, MLOps
@ JPMorgan Chase & Co. | NY, United States
Software Engineer III - Full Stack Developer - ModelOps, MLOps
@ JPMorgan Chase & Co. | NY, United States
Research Scientist (m/w/d) - Numerische Simulation Laser-Materie-Wechselwirkung
@ Fraunhofer-Gesellschaft | Freiburg, DE, 79104
Research Scientist, Speech Real-Time Dialog
@ Google | Mountain View, CA, USA