all AI news
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
April 2, 2024, 7:51 p.m. | Jia Li, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin
cs.CL updates on arXiv.org arxiv.org
Abstract: How to evaluate Large Language Models (LLMs) in code generation is an open question. Existing benchmarks demonstrate poor alignment with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. This paper proposes a new benchmark - EvoCodeBench to address the preceding problems, which has three primary advances. (1) EvoCodeBench aligns with real-world repositories in multiple dimensions, e.g., code distributions and dependency distributions. (2) EvoCodeBench offers comprehensive annotations (e.g., requirements, reference code, …
abstract alignment arxiv benchmark benchmarks code code generation coding cs.ai cs.cl cs.se language language models large language large language models llms paper question repositories type world
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US