all AI news
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Feb. 27, 2024, 5:49 a.m. | Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki
cs.CL updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics. However, the likelihood, a measure of LLM's plausibility for a sentence, can vary due to superficial differences in sentences, such as word order and sentence structure. It is therefore possible that there might be a likelihood bias if LLMs are used for evaluation: they might overrate sentences with higher likelihoods while underrating those with lower likelihoods. In this paper, …
abstract arxiv automated bias cs.ai cs.cl differences evaluation language language generation language models large language large language models likelihood llm llms metrics natural natural language natural language generation tasks type word
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Science Analyst
@ Mayo Clinic | AZ, United States
Sr. Data Scientist (Network Engineering)
@ SpaceX | Redmond, WA