all AI news
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models
March 13, 2024, 4:47 a.m. | Yan Liu, Renren Jin, Lin Shi, Zheng Yao, Deyi Xiong
cs.CL updates on arXiv.org arxiv.org
Abstract: To thoroughly assess the mathematical reasoning abilities of Large Language Models (LLMs), we need to carefully curate evaluation datasets covering diverse mathematical concepts and mathematical problems at different difficulty levels. In pursuit of this objective, we propose FineMath in this paper, a fine-grained mathematical evaluation benchmark dataset for assessing Chinese LLMs. FineMath is created to cover the major key mathematical concepts taught in elementary school math, which are further divided into 17 categories of math …
abstract arxiv benchmark chinese concepts cs.ai cs.cl datasets diverse evaluation fine-grained language language models large language large language models llms mathematical reasoning paper reasoning type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US