ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology | allainews.com

June 14, 2024, 4:42 a.m. | Junlei Zhang, Hongliang He, Nirui Song, Zhanchao Zhou, Shuyuan He, Shuai Zhang, Huachuan Qiu, Anqi Li, Yong Dai, Lizhi Ma, Zhenzhong Lan

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.09861v3 Announce Type: replace
Abstract: The critical field of psychology necessitates a comprehensive benchmark to enhance the evaluation and development of domain-specific Large Language Models (LLMs). Existing MMLU-type benchmarks, such as C-EVAL and CMMLU, include psychology-related subjects, but their limited number of questions and lack of systematic concept sampling strategies mean they cannot cover the concepts required in psychology. Consequently, despite their broad subject coverage, these benchmarks lack the necessary depth in the psychology domain, making them inadequate as psychology-specific …

abstract arxiv benchmark benchmarks concept cs.ai cs.cl development domain evaluation language language models large language large language models llms mean mmlu psychology questions replace sampling strategies type

More from arxiv.org / cs.CL updates on arXiv.org

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach 15 hours ago | arxiv.org

abstract algorithms analysis arxiv +22

Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation 15 hours ago | arxiv.org

abstract applications arxiv cs.ai +13

LLM-SQL-Solver: Can LLMs Determine SQL Equivalence? 15 hours ago | arxiv.org

abstract applications arxiv community +24

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models 15 hours ago | arxiv.org

abstract advantages alignment arxiv +22

Exploring ChatGPT's Capabilities on Vulnerability Management 15 hours ago | arxiv.org

abstract analysis arxiv attention +22

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction 15 hours ago | arxiv.org

action arxiv cs.cl cs.cv +9

Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation 15 hours ago | arxiv.org

abstract adapt arxiv capability +19

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models 15 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.cv +13

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs 15 hours ago | arxiv.org

arxiv bootstrapping cs.cl cs.cv +5

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Senior Principal Software Engineer

@ Oracle | Columbia, MD, United States

View on ai-jobs.net

Software Engineer for Manta Systems

@ PXGEO | Linköping, Östergötland County, Sweden

View on ai-jobs.net

DevOps Engineer

@ Teradyne | Odense, DK

View on ai-jobs.net

LIDAR System Engineer Trainee

@ Valeo | PRAGUE - PRA2

View on ai-jobs.net

Business Applications Administrator

@ Allegro | Poznań, Poland

View on ai-jobs.net