all AI news
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
Feb. 22, 2024, 5:48 a.m. | Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, Dacheng Tao
cs.CL updates on arXiv.org arxiv.org
Abstract: Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks, such as machine translation, text summarization. Recent research (Kocmi and Federmann, 2023) has shown that utilizing LLMs for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level but \textit{performs poorly at the segment level}. To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting …
abstract analysis art arxiv chatgpt cs.cl error evaluation generative human human-like language language models large language large language models llms machine machine translation nlp performance prompting quality research state summarization tasks text text summarization translation type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Data Analyst
@ Alstom | Johannesburg, GT, ZA