Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study | allainews.com

Feb. 12, 2024, 5:42 a.m. | Yufei Li Simin Chen Yanghong Guo Wei Yang Yue Dong Cong Liu

cs.LG updates on arXiv.org arxiv.org

Large Language Models (LLMs) have been widely employed in programming language analysis to enhance human productivity. Yet, their reliability can be compromised by various code distribution shifts, leading to inconsistent outputs. While probabilistic methods are known to mitigate such impact through uncertainty calibration and estimation, their efficacy in the language domain remains underexplored compared to their application in image-based tasks. In this work, we first introduce a large-scale benchmark dataset, incorporating three realistic patterns of code distribution shifts at varying …

analysis benchmark code cs.cl cs.lg cs.se distribution human impact language language analysis language models large language large language models llms productivity programming programming language reliability study through uncertainty

More from arxiv.org / cs.LG updates on arXiv.org

CascadedGaze: Efficiency in Global Context Extraction for Image Restoration 19 hours ago | arxiv.org

abstract arxiv attention attention mechanisms +23

Link Me Baby One More Time: Social Music Discovery on Spotify 19 hours ago | arxiv.org

abstract arxiv baby cs.ir +15

Risk-anticipatory autonomous driving strategies considering vehicles' weights, based on hierarchical deep reinforcement learning 19 hours ago | arxiv.org

abstract accidents arxiv autonomous +20

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models 19 hours ago | arxiv.org

abstract annotation arxiv capabilities +21

Toward Deep Drum Source Separation 19 hours ago | arxiv.org

abstract adoption applications arxiv +14

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor 19 hours ago | arxiv.org

abstract arxiv capacity clip +21

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm 19 hours ago | arxiv.org

abstract algorithm arxiv case +14

Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios 19 hours ago | arxiv.org

abstract arxiv challenges cs.ai +15

SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions 19 hours ago | arxiv.org

abstract accuracy algorithms arxiv +14

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net