Towards Explainable Evaluation Metrics for Machine Translation | allainews.com

Jan. 1, 2024, midnight | Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger

JMLR www.jmlr.org

Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties …

bleu box comet correlations current evaluation evaluation metrics example human language language models large language large language models machine machine translation metrics quality research translation

More from www.jmlr.org / JMLR

Functions with average smoothness: structure, algorithms, and learning 4 months, 2 weeks ago | www.jmlr.org

algorithms analysis complexity function +4

Generative Adversarial Ranking Nets 4 months, 2 weeks ago | www.jmlr.org

Predictive Inference with Weak Supervision 4 months, 2 weeks ago | www.jmlr.org

bridge confidence data framework +12

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions 4 months, 2 weeks ago | www.jmlr.org

approximation beyond diverse function +10

Model-Free Representation Learning and Exploration in Low-Rank MDPs 4 months, 2 weeks ago | www.jmlr.org

algorithms contrast dynamics exploration +9

Effect-Invariant Mechanisms for Policy Generalization 4 months, 2 weeks ago | www.jmlr.org

adapt challenge environments exploit +7

Pygmtools: A Python Graph Matching Toolkit 4 months, 2 weeks ago | www.jmlr.org

applications collection free graph +8

Power of knockoff: The impact of ranking algorithm, augmented design, and symmetric statistic 4 months, 2 weeks ago | www.jmlr.org

algorithm components control design +11

Heterogeneous-Agent Reinforcement Learning 4 months, 2 weeks ago | www.jmlr.org

agent agents ai research convergence +10

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net