Flames: Benchmarking Value Alignment of Chinese Large Language Models | allainews.com

April 2, 2024, 7:52 p.m. | Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.06899v2 Announce Type: replace
Abstract: The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values. Current benchmarks, however, fall short of effectively uncovering safety vulnerabilities in LLMs. Despite numerous models achieving high scores and 'topping the chart' in these evaluations, there is still a significant gap in LLMs' deeper alignment with human values and achieving genuine harmlessness. To this end, this paper proposes a value alignment benchmark named …

alignment arxiv benchmarking chinese cs.ai cs.cl language language models large language large language models type value

More from arxiv.org / cs.CL updates on arXiv.org

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback 20 hours ago | arxiv.org

alignment arxiv cs.cl feedback +5

Can Language Model Moderators Improve the Health of Online Discourse? 20 hours ago | arxiv.org

abstract arxiv communities conversational +19

R-Tuning: Instructing Large Language Models to Say `I Don't Know' 20 hours ago | arxiv.org

arxiv cs.cl language language models +3

On-the-Fly Fusion of Large Language Models and Machine Translation 20 hours ago | arxiv.org

abstract arxiv cs.cl data +12

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset 20 hours ago | arxiv.org

abstract arxiv assessment cs.ai +16

Making Retrieval-Augmented Language Models Robust to Irrelevant Context 20 hours ago | arxiv.org

abstract arxiv context cs.ai +14

RA-DIT: Retrieval-Augmented Dual Instruction Tuning 20 hours ago | arxiv.org

abstract arxiv build cs.ai +19

Bengali Fake Reviews: A Benchmark Dataset and Detection System 20 hours ago | arxiv.org

abstract arxiv benchmark businesses +16

How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain 20 hours ago | arxiv.org

abstract arxiv capabilities cs.cl +14

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net