Feb. 13, 2024, 5:48 a.m. | Guanyi Chen Fahime Same Kees van Deemter

cs.CL updates on arXiv.org arxiv.org

Recently, a human evaluation study of Referring Expression Generation (REG) models had an unexpected conclusion: on \textsc{webnlg}, Referring Expressions (REs) generated by the state-of-the-art neural models were not only indistinguishable from the REs in \textsc{webnlg} but also from the REs generated by a simple rule-based system. Here, we argue that this limitation could stem from the use of a purely ratings-based human evaluation (which is a common practice in Natural Language Generation). To investigate these issues, we propose an intrinsic …

art cs.cl evaluation generated human intrinsic simple state study

