CodeMind: A Framework to Challenge Large Language Models for Code Reasoning | allainews.com

Feb. 16, 2024, 5:47 a.m. | Changshu Liu, Shizhuo Dylan Zhang, Reyhaneh Jabbarvand

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.09664v1 Announce Type: cross
Abstract: Solely relying on test passing to evaluate Large Language Models (LLMs) for code synthesis may result in unfair assessment or promoting models with data leakage. As an alternative, we introduce CodeMind, a framework designed to gauge the code reasoning abilities of LLMs. CodeMind currently supports three code reasoning tasks: Independent Execution Reasoning (IER), Dependent Execution Reasoning (DER), and Specification Reasoning (SR). The first two evaluate models to predict the execution output of an arbitrary code …

abstract arxiv assessment challenge code cs.ai cs.cl cs.pl cs.se data data leakage framework language language models large language large language models llms reasoning synthesis test type

More from arxiv.org / cs.CL updates on arXiv.org

Hijacking Context in Large Multi-modal Models 15 hours ago | arxiv.org

abstract arxiv contents context +16

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks 15 hours ago | arxiv.org

abstract arxiv concerns cs.cl +21

Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias 15 hours ago | arxiv.org

abstract accuracy agent arxiv +28

Small Language Model Can Self-correct 15 hours ago | arxiv.org

abstract arxiv capability chatgpt +18

Prompt-based mental health screening from social media text 15 hours ago | arxiv.org

abstract article arxiv bag +17

Scaling Political Texts with Large Language Models: Asking a Chatbot Might Be All You Need 15 hours ago | arxiv.org

abstract arxiv author chatbot +20

Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis 15 hours ago | arxiv.org

analysis arxiv attribution bias +10

Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey 15 hours ago | arxiv.org

abstract arxiv chatgpt cs.ai +27

Hidden Citations Obscure True Impact in Science 15 hours ago | arxiv.org

abstract arxiv citations clear +19

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net