Feb. 27, 2024, 5:50 a.m. | Zifan Wang, Kotaro Funakoshi, Manabu Okumura

cs.CL updates on arXiv.org arxiv.org

arXiv:2309.12546v2 Announce Type: replace
Abstract: Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed for natural language generation (NLG) tasks, are based on measuring the n-gram overlap between the generated and reference text. These simple metrics may be insufficient for more complex tasks, such as question generation (QG), which requires generating questions that are answerable by the reference answers. Developing a more sophisticated automatic evaluation metric, thus, remains an urgent problem in QG research. This work proposes PMAN (Prompting-based …

abstract arxiv bleu cs.cl evaluation evaluation metrics generated language language generation measuring metrics natural natural language natural language generation nlg question questions reference simple tasks text type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston