May 15, 2024, 4:47 a.m. | Chenghao Zhu, Nuo Chen, Yufei Gao, Benyou Wang

arXiv:2405.08460v1 Announce Type: new
Abstract: The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of LLMs in ever-changing real-world scenarios. Furthermore, these benchmarks do not adequately measure the models' capabilities over a broader temporal range or their …

