June 11, 2024, 4:41 a.m. | Weiping Fu, Bifan Wei, Jianxiang Hu, Zhongmin Cai, Jun Liu

arXiv:2406.05707v1 Announce Type: new
Abstract: Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified evaluation criteria, which hampers the development of both QG technologies and automatic evaluation methods. To …

