all AI news
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
April 2, 2024, 7:51 p.m. | Jia Li, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin
cs.CL updates on arXiv.org arxiv.org
Abstract: How to evaluate Large Language Models (LLMs) in code generation is an open question. Existing benchmarks demonstrate poor alignment with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. This paper proposes a new benchmark - EvoCodeBench to address the preceding problems, which has three primary advances. (1) EvoCodeBench aligns with real-world repositories in multiple dimensions, e.g., code distributions and dependency distributions. (2) EvoCodeBench offers comprehensive annotations (e.g., requirements, reference code, …
abstract alignment arxiv benchmark benchmarks code code generation coding cs.ai cs.cl cs.se language language models large language large language models llms paper question repositories type world
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Alternance DATA/AI Engineer (H/F)
@ SQLI | Le Grand-Quevilly, France