Feb. 20, 2024, 5:53 a.m. | Yiqi Liu, Nafise Sadat Moosavi, Chenghua Lin

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.09766v2 Announce Type: replace
Abstract: Automatic evaluation of generated textual content presents an ongoing challenge within the field of NLP. Given the impressive capabilities of modern language models (LMs) across diverse NLP tasks, there is a growing trend to employ these models in creating innovative evaluation metrics for automated assessment of generation tasks. This paper investigates a pivotal question: Do language model-driven evaluation metrics inherently exhibit bias favoring texts generated by the same underlying language model? Specifically, we assess whether …

abstract arxiv assessment automated capabilities challenge cs.cl diverse evaluation evaluation metrics generated language language models llms lms metrics modern nlp tasks textual trend type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US