One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation | allainews.com

Feb. 20, 2024, 5:51 a.m. | Tejpalsingh Siledar, Swaroop Nath, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Swaprava Nath, Pushpak Bhattacharyya, Suman Banerjee, Amey P

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.11683v1 Announce Type: new
Abstract: Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluation datasets inhibit progress. To address this, we release the SUMMEVAL-OP dataset covering 7 dimensions related to the evaluation of opinion summaries: fluency, …

abstract arxiv correlation cs.cl evaluation free human language language models large language large language models llms low metrics nlg opinion prompt reference studies summary them type

More from arxiv.org / cs.CL updates on arXiv.org

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications an hour ago | arxiv.org

abstract applications arxiv challenge +26

Unlearning Traces the Influential Training Data of Language Models an hour ago | arxiv.org

abstract arxiv cs.ai cs.cl +17

Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings an hour ago | arxiv.org

abstract analysis arxiv components +20

Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction an hour ago | arxiv.org

abstract arxiv case court +14

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI an hour ago | arxiv.org

abstract agi art arxiv +21

ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology an hour ago | arxiv.org

abstract arxiv benchmark benchmarks +19

MC$^2$: Towards Transparent and Culturally-Aware NLP for Minority Languages in China an hour ago | arxiv.org

abstract accessibility arxiv challenge +19

Dodo: Dynamic Contextual Compression for Decoder-only LMs an hour ago | arxiv.org

abstract arxiv attention compression +23

Active Learning for Multilingual Fingerspelling Corpora an hour ago | arxiv.org

abstract active learning analysis apply +16

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

Customer Data Analyst with Spanish

@ Michelin | Voluntari

View on ai-jobs.net

HC Data Analyst - Senior

@ Leidos | 1662 Intelligence Community Campus - Bethesda MD

View on ai-jobs.net

Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease

@ Clarivate | Remote (121- Massachusetts)

View on ai-jobs.net

Data Analyst (maternity leave cover)

@ Clarivate | R155-Belgrade

View on ai-jobs.net

Sales Enablement Data Analyst (Remote)

@ CrowdStrike | USA TX Remote

View on ai-jobs.net