April 17, 2024, 4:43 a.m. | Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

cs.LG updates on arXiv.org arxiv.org

arXiv:2310.07641v2 Announce Type: replace-cross
Abstract: As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models. This paper investigates the efficacy of these ``LLM evaluators'', particularly in using them to assess instruction following, a metric that gauges how closely generated text adheres to the given instruction. We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an …

abstract arxiv cost cs.cl cs.lg evaluation ever human language language models large language large language models list llm llms paper research scalable them type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist (Computer Science)

@ Nanyang Technological University | NTU Main Campus, Singapore

Intern - Sales Data Management

@ Deliveroo | Dubai, UAE (Main Office)