all AI news
DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents
Feb. 26, 2024, 5:42 a.m. | Kaijie Zhu, Jindong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie
cs.LG updates on arXiv.org arxiv.org
Abstract: Evaluation of large language models (LLMs) has raised great concerns in the community due to the issue of data contamination. Existing work designed evaluation protocols using well-defined algorithms for specific tasks, which cannot be easily extended to diverse scenarios. Moreover, current evaluation benchmarks can only provide the overall benchmark results and cannot support a fine-grained and multifaceted analysis of LLMs' abilities. In this paper, we propose meta probing agents (MPA), a general dynamic evaluation protocol …
abstract agents algorithms arxiv benchmarks community concerns cs.ai cs.cl cs.lg current data diverse dynamic evaluation issue language language models large language large language models llms meta specific tasks tasks type work
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
Customer Data Analyst with Spanish
@ Michelin | Voluntari
HC Data Analyst - Senior
@ Leidos | 1662 Intelligence Community Campus - Bethesda MD
Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease
@ Clarivate | Remote (121- Massachusetts)
Data Analyst (maternity leave cover)
@ Clarivate | R155-Belgrade
Sales Enablement Data Analyst (Remote)
@ CrowdStrike | USA TX Remote