Feb. 13, 2024, 5:49 a.m. | Haochen Tan Zhijiang Guo Zhan Shi Lu Xu Zhili Liu Yunlong Feng Xiaoguang Li Yasheng Wang

cs.CL updates on arXiv.org arxiv.org

Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we introduce \textsc{ProxyQA}, a framework for evaluating long-form text generation, comprising in-depth human-curated \textit{meta-questions} spanning various domains. Each meta-question contains corresponding \textit{proxy-questions} with annotated answers. LLMs are prompted to …

cs.ai cs.cl

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Data Engineering Director-Big Data technologies (Hadoop, Spark, Hive, Kafka)

@ Visa | Bengaluru, India

Senior Data Engineer

@ Manulife | Makati City, Manulife Philippines Head Office

GDS Consulting Senior Data Scientist 2

@ EY | Taguig, PH, 1634

IT Data Analyst Team Lead

@ Rosecrance | Rockford, Illinois, United States