all AI news
ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain
Feb. 22, 2024, 5:48 a.m. | Bingchao Wang
cs.CL updates on arXiv.org arxiv.org
Abstract: Recently, various Large Language Models (LLMs) evaluation datasets have emerged, but most of them have issues with distorted rankings and difficulty in model capabilities analysis. Addressing these concerns, this paper introduces ANGO, a Chinese multi-choice question evaluation benchmark. ANGO proposes Keypoint categorization standard for the first time, each question in ANGO can correspond to multiple keypoints, effectively enhancing interpretability of evaluation results. Base on performance of real humans, we build a quantifiable question difficulty standard …
abstract analysis arxiv benchmark capabilities chinese concerns cs.ai cs.cl datasets domain evaluation language language models large language large language models llms next paper question rankings standard them type
More from arxiv.org / cs.CL updates on arXiv.org
ALBA: Adaptive Language-based Assessments for Mental Health
2 days, 13 hours ago |
arxiv.org
PACE: Improving Prompt with Actor-Critic Editing for Large Language Model
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US