April 11, 2024, 4:47 a.m. | Yongqiang Ma, Lizhi Qin, Jiawei Liu, Yangyang Kang, Yue Zhang, Wei Lu, Xiaozhong Liu, Qikai Cheng

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.07108v1 Announce Type: new
Abstract: Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. Our proposed metric, termed ``Revision Distance,'' utilizes LLMs to suggest revision edits that mimic the human writing process. It is determined by counting the revision …

abstract applications arxiv context cs.cl cs.ir development evaluation experience focus human language language models large language large language models llm llm development llms numerical practical study text type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA