XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies | allainews.com

April 9, 2024, 4:51 a.m. | Xuanfan Ni, Hengyi Cai, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Piji Li

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.05446v1 Announce Type: new
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to 200K input tokens. Meanwhile, building high-quality benchmarks with much longer text lengths and more demanding tasks to provide comprehensive evaluations is of immense practical interest to facilitate long context understanding research of LLMs. However, prior benchmarks create datasets that ostensibly …

abstract arxiv benchmark benchmarks building context context window cs.cl dependencies diverse expand language language models large language large language models llms performance quality small tasks tokens type understanding

More from arxiv.org / cs.CL updates on arXiv.org

Sparse is Enough in Fine-tuning Pre-trained Large Language Models 3 days ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

On the Learnability of Watermarks for Language Models 3 days ago | arxiv.org

abstract arxiv cs.cl cs.cr +17

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization 3 days ago | arxiv.org

abstract arxiv capabilities cs.ai +14

Evaluating Generative Ad Hoc Information Retrieval 3 days ago | arxiv.org

abstract advances arxiv cs.cl +19

Language Models As Semantic Indexers 3 days ago | arxiv.org

arxiv cs.cl cs.ir cs.lg +4

Large language models can accurately predict searcher preferences 3 days ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

On the Reliability of Watermarks for Large Language Models 3 days ago | arxiv.org

abstract arxiv become bots +28

A Watermark for Large Language Models 3 days ago | arxiv.org

abstract arxiv cs.cl cs.cr +16

CreoleVal: Multilingual Multitask Benchmarks for Creoles 3 days ago | arxiv.org

abstract annotated data arxiv benchmarks +14

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Machine Learning Engineer

@ Apple | Sunnyvale, California, United States

View on ai-jobs.net