Feb. 26, 2024, 5:42 a.m. | Kaijie Zhu, Jindong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.14865v1 Announce Type: cross
Abstract: Evaluation of large language models (LLMs) has raised great concerns in the community due to the issue of data contamination. Existing work designed evaluation protocols using well-defined algorithms for specific tasks, which cannot be easily extended to diverse scenarios. Moreover, current evaluation benchmarks can only provide the overall benchmark results and cannot support a fine-grained and multifaceted analysis of LLMs' abilities. In this paper, we propose meta probing agents (MPA), a general dynamic evaluation protocol …

abstract agents algorithms arxiv benchmarks community concerns cs.ai cs.cl cs.lg current data diverse dynamic evaluation issue language language models large language large language models llms meta specific tasks tasks type work

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

Customer Data Analyst with Spanish

@ Michelin | Voluntari

HC Data Analyst - Senior

@ Leidos | 1662 Intelligence Community Campus - Bethesda MD

Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease

@ Clarivate | Remote (121- Massachusetts)

Data Analyst (maternity leave cover)

@ Clarivate | R155-Belgrade

Sales Enablement Data Analyst (Remote)

@ CrowdStrike | USA TX Remote