April 2, 2024, 7:51 p.m. | Jia Li, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.00599v1 Announce Type: new
Abstract: How to evaluate Large Language Models (LLMs) in code generation is an open question. Existing benchmarks demonstrate poor alignment with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. This paper proposes a new benchmark - EvoCodeBench to address the preceding problems, which has three primary advances. (1) EvoCodeBench aligns with real-world repositories in multiple dimensions, e.g., code distributions and dependency distributions. (2) EvoCodeBench offers comprehensive annotations (e.g., requirements, reference code, …

abstract alignment arxiv benchmark benchmarks code code generation coding cs.ai cs.cl cs.se language language models large language large language models llms paper question repositories type world

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Alternance DATA/AI Engineer (H/F)

@ SQLI | Le Grand-Quevilly, France