HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization | allainews.com

Feb. 27, 2024, 5:50 a.m. | Qiwei Peng, Yekun Chai, Xuhong Li

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.16694v1 Announce Type: new
Abstract: Large language models (LLMs) have made significant progress in generating codes from textual prompts. However, existing benchmarks have mainly concentrated on translating English prompts to multilingual codes or have been constrained to very limited natural languages (NLs). These benchmarks have overlooked the vast landscape of massively multilingual NL to multilingual code, leaving a critical gap in the evaluation of multilingual LLMs. In response, we introduce HumanEval-XL, a massively multilingual code generation benchmark specifically crafted to …

arxiv benchmark code code generation cross-lingual cs.cl cs.pl cs.se humaneval language multilingual natural natural language type

More from arxiv.org / cs.CL updates on arXiv.org

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring 25 minutes ago | arxiv.org

abstract arxiv biases concerns +24

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks 25 minutes ago | arxiv.org

arxiv building cs.ai cs.cl +5

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark 25 minutes ago | arxiv.org

abstract analysis arxiv basic +28

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces 25 minutes ago | arxiv.org

abstract ancestry arxiv authors +21

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification 25 minutes ago | arxiv.org

arxiv classification cs.cl cs.cv +9

Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations 25 minutes ago | arxiv.org

abstract art arxiv clip +17

Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Trafficking Text Corpora 25 minutes ago | arxiv.org

abstract arxiv change climate +19

Large Language Models Show Human-like Social Desirability Biases in Survey Responses 25 minutes ago | arxiv.org

abstract arxiv become behavior +25

Linearizing Large Language Models 25 minutes ago | arxiv.org

arxiv cs.cl language language models +3

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net