Web: http://arxiv.org/abs/2206.11249

June 24, 2022, 1:12 a.m. | Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashi

cs.CL updates on arXiv.org arxiv.org

Evaluation in machine learning is usually informed by past choices, for
example which datasets or metrics to use. This standardization enables the
comparison on equal footing using leaderboards, but the evaluation choices
become sub-optimal as better alternatives arise. This problem is especially
pertinent in natural language generation which requires ever-improving suites
of datasets, metrics, and human evaluation to make definitive claims. To make
following best model evaluation practices easier, we introduce GEMv2. The new
version of the Generation, Evaluation, and …

arxiv benchmarking code line nlg

More from arxiv.org / cs.CL updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY