all AI news
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Feb. 27, 2024, 5:49 a.m. | Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki
cs.CL updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics. However, the likelihood, a measure of LLM's plausibility for a sentence, can vary due to superficial differences in sentences, such as word order and sentence structure. It is therefore possible that there might be a likelihood bias if LLMs are used for evaluation: they might overrate sentences with higher likelihoods while underrating those with lower likelihoods. In this paper, …
abstract arxiv automated bias cs.ai cs.cl differences evaluation language language generation language models large language large language models likelihood llm llms metrics natural natural language natural language generation tasks type word
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Global Clinical Data Manager
@ Warner Bros. Discovery | CRI - San Jose - San Jose (City Place)
Global Clinical Data Manager
@ Warner Bros. Discovery | COL - Cundinamarca - Bogotá (Colpatria)
Ingénieur Data Manager / Pau
@ Capgemini | Paris, FR