all AI news
XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies
April 9, 2024, 4:51 a.m. | Xuanfan Ni, Hengyi Cai, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Piji Li
cs.CL updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to 200K input tokens. Meanwhile, building high-quality benchmarks with much longer text lengths and more demanding tasks to provide comprehensive evaluations is of immense practical interest to facilitate long context understanding research of LLMs. However, prior benchmarks create datasets that ostensibly …
abstract arxiv benchmark benchmarks building context context window cs.cl dependencies diverse expand language language models large language large language models llms performance quality small tasks tokens type understanding
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Machine Learning Engineer
@ Apple | Sunnyvale, California, United States