CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation | allainews.com

June 27, 2024, 4:42 a.m. | Pei Ke, Bosi Wen, Zhuoer Feng, Xiao Liu, Xuanyu Lei, Jiale Cheng, Shengyuan Wang, Aohan Zeng, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.18702v2 Announce Type: replace
Abstract: Since the natural language processing (NLP) community started to make large language models (LLMs) act as a critic to evaluate the quality of generated texts, most of the existing works train a critique generation model on the evaluation data labeled by GPT-4's direct prompting. We observe that these models lack the ability to generate informative critiques in both pointwise grading and pairwise comparison especially without references. As a result, their generated critiques cannot provide fine-grained …

abstract act arxiv community critique cs.ai cs.cl data evaluation generated language language model language models language processing large language large language model large language models llms natural natural language natural language processing nlp processing quality replace train type

More from arxiv.org / cs.CL updates on arXiv.org

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector 1 day, 16 hours ago | arxiv.org

abstract arxiv audio cs.cl +22

Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? 1 day, 16 hours ago | arxiv.org

abstract adapt arxiv communication +23

ReFT: Reasoning with Reinforced Fine-Tuning 1 day, 16 hours ago | arxiv.org

abstract annotations arxiv capability +22

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability 1 day, 16 hours ago | arxiv.org

abstract accuracy arxiv cs.cl +13

Exploring Defeasibility in Causal Reasoning 1 day, 16 hours ago | arxiv.org

abstract arxiv causal causal reasoning +7

Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial … 1 day, 16 hours ago | arxiv.org

abstract annotation arxiv capacity +26

Theory of Mind for Multi-Agent Collaboration via Large Language Models 1 day, 16 hours ago | arxiv.org

abstract agent agents arxiv +28

Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement 1 day, 16 hours ago | arxiv.org

arxiv cs.ai cs.cl focus +12

A Large Language Model Approach to Educational Survey Feedback Analysis 1 day, 16 hours ago | arxiv.org

abstract analysis arxiv capabilities +27

Quantitative Researcher – Algorithmic Research

@ Man Group | GB London Riverbank House

View on ai-jobs.net

Software Engineering Expert

@ Sanofi | Budapest

View on ai-jobs.net

Senior Bioinformatics Scientist

@ Illumina | US - Bay Area - Foster City

View on ai-jobs.net

Senior Engineer - Generative AI Product Engineering (Remote-Eligible)

@ Capital One | McLean, VA

View on ai-jobs.net

Graduate Assistant - Bioinformatics

@ University of Arkansas System | University of Arkansas at Little Rock

View on ai-jobs.net

Senior AI-HPC Cluster Engineer

@ NVIDIA | US, CA, Santa Clara

View on ai-jobs.net