all AI news
The Glass Ceiling of Automatic Evaluation in Natural Language Generation. (arXiv:2208.14585v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
Automatic evaluation metrics capable of replacing human judgments are
critical to allowing fast development of new methods. Thus, numerous research
efforts have focused on crafting such metrics. In this work, we take a step
back and analyze recent progress by comparing the body of existing automatic
metrics and human metrics altogether. As metrics are used based on how they
rank systems, we compare metrics in the space of system rankings. Our extensive
statistical analysis reveals surprising findings: automatic metrics -- …
arxiv evaluation generation language language generation natural natural language natural language generation