all AI news
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology
June 14, 2024, 4:42 a.m. | Junlei Zhang, Hongliang He, Nirui Song, Zhanchao Zhou, Shuyuan He, Shuai Zhang, Huachuan Qiu, Anqi Li, Yong Dai, Lizhi Ma, Zhenzhong Lan
cs.CL updates on arXiv.org arxiv.org
Abstract: The critical field of psychology necessitates a comprehensive benchmark to enhance the evaluation and development of domain-specific Large Language Models (LLMs). Existing MMLU-type benchmarks, such as C-EVAL and CMMLU, include psychology-related subjects, but their limited number of questions and lack of systematic concept sampling strategies mean they cannot cover the concepts required in psychology. Consequently, despite their broad subject coverage, these benchmarks lack the necessary depth in the psychology domain, making them inadequate as psychology-specific …
abstract arxiv benchmark benchmarks concept cs.ai cs.cl development domain evaluation language language models large language large language models llms mean mmlu psychology questions replace sampling strategies type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Senior Principal Software Engineer
@ Oracle | Columbia, MD, United States
Software Engineer for Manta Systems
@ PXGEO | Linköping, Östergötland County, Sweden
DevOps Engineer
@ Teradyne | Odense, DK
LIDAR System Engineer Trainee
@ Valeo | PRAGUE - PRA2
Business Applications Administrator
@ Allegro | Poznań, Poland