CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models | allainews.com

March 12, 2024, 4:52 a.m. | Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkua

cs.CL updates on arXiv.org arxiv.org

arXiv:2309.01940v4 Announce Type: replace
Abstract: With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is crucial as it reflects the multifaceted abilities of LLMs, and it has numerous downstream applications. In this paper, we propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs. Programming comprehension task tests LLMs …

abstract arxiv attention benchmark bilingual capabilities cs.ai cs.cl emergence evaluation improvement language language models large language large language models llms programming researchers type

More from arxiv.org / cs.CL updates on arXiv.org

Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows 22 hours ago | arxiv.org

abstract arxiv crowdsourcing cs.ai +13

GraphGPT: Graph Instruction Tuning for Large Language Models 22 hours ago | arxiv.org

arxiv cs.ai cs.cl graph +6

How Fragile is Relation Extraction under Entity Replacements? 22 hours ago | arxiv.org

arxiv cs.ai cs.cl extraction +1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence 22 hours ago | arxiv.org

abstract agents arxiv code +25

Enriched BERT Embeddings for Scholarly Publication Classification 22 hours ago | arxiv.org

abstract academic articles arxiv +16

Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code … 22 hours ago | arxiv.org

abstract arxiv code code generation +20

HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech 22 hours ago | arxiv.org

abstract alzheimer's architectures arxiv +22

CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion 22 hours ago | arxiv.org

arxiv cs.ai cs.cl graph +5

Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration 22 hours ago | arxiv.org

abstract agent agents analyze +19

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net