Feb. 26, 2024, 5:42 a.m. | Kaijie Zhu, Jindong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.14865v1 Announce Type: cross
Abstract: Evaluation of large language models (LLMs) has raised great concerns in the community due to the issue of data contamination. Existing work designed evaluation protocols using well-defined algorithms for specific tasks, which cannot be easily extended to diverse scenarios. Moreover, current evaluation benchmarks can only provide the overall benchmark results and cannot support a fine-grained and multifaceted analysis of LLMs' abilities. In this paper, we propose meta probing agents (MPA), a general dynamic evaluation protocol …

abstract agents algorithms arxiv benchmarks community concerns cs.ai cs.cl cs.lg current data diverse dynamic evaluation issue language language models large language large language models llms meta specific tasks tasks type work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US