all AI news
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark. (arXiv:2112.13610v2 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
Realizing general-purpose language intelligence has been a longstanding goal
for natural language processing, where standard evaluation benchmarks play a
fundamental and guiding role. We argue that for general-purpose language
intelligence evaluation, the benchmark itself needs to be comprehensive and
systematic. To this end, we propose CUGE, a Chinese Language Understanding and
Generation Evaluation benchmark with the following features: (1) Hierarchical
benchmark framework, where datasets are principally selected and organized with
a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy,
where different …
arxiv benchmark evaluation generation language understanding