Feb. 13, 2024, 5:49 a.m. | Haochen Tan Zhijiang Guo Zhan Shi Lu Xu Zhili Liu Yunlong Feng Xiaoguang Li Yasheng Wang

cs.CL updates on arXiv.org arxiv.org

Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we introduce \textsc{ProxyQA}, a framework for evaluating long-form text generation, comprising in-depth human-curated \textit{meta-questions} spanning various domains. Each meta-question contains corresponding \textit{proxy-questions} with annotated answers. LLMs are prompted to …

cs.ai cs.cl

