RepEval: Effective Text Evaluation with LLM Representation | allainews.com

May 1, 2024, 4:48 a.m. | Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.19563v1 Announce Type: new
Abstract: Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the first metric leveraging the projection of LLM representations for evaluation. RepEval requires minimal sample pairs …

abstract applications arxiv cs.cl demand evaluation evaluation metrics generated growth however llm llm applications llms making metrics nlg representation requirements role text type

More from arxiv.org / cs.CL updates on arXiv.org

Biomedical knowledge graph-optimized prompt generation for large language models 15 hours ago | arxiv.org

abstract arxiv biomedical biomedicine +27

Primacy Effect of ChatGPT 15 hours ago | arxiv.org

arxiv chatgpt cs.ai cs.cl +2

Are Models Trained on Indian Legal Data Fair? 15 hours ago | arxiv.org

abstract advances applications artificial +27

Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT 15 hours ago | arxiv.org

abstract agent arxiv chatgpt +13

Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design 15 hours ago | arxiv.org

abstract agent ai-powered ai systems +21

Robot Detection System 1: Front-Following 15 hours ago | arxiv.org

abstract advantages arxiv cs.cl +14

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram 15 hours ago | arxiv.org

abstract annotation arxiv biomedical +12

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in … 15 hours ago | arxiv.org

abstract arxiv beyond cs.ai +15

From Text to Context: An Entailment Approach for News Stakeholder Classification 15 hours ago | arxiv.org

abstract actors articles arxiv +13

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net