all AI news
Marathon: A Race Through the Realm of Long Context with Large Language Models
June 27, 2024, 4:42 a.m. | Lei Zhang, Yunshui Li, Ziqiang Liu, Jiaxi yang, Junhao Liu, Longze Chen, Run Luo, Min Yang
cs.CL updates on arXiv.org arxiv.org
Abstract: With the advancement of large language models (LLMs) and the expansion of their context windows, existing long-context benchmarks fall short in effectively evaluating the models' comprehension and reasoning abilities in extended texts. Moreover, conventional benchmarks relying on F1 metrics often inaccurately score responses: they may undervalue correct answers that differ from the reference responses and overvalue incorrect ones that resemble the reference texts. In response to these limitations, we introduce Marathon, a novel evaluation benchmark …
arxiv context cs.cl language language models large language large language models marathon race realm replace through type
More from arxiv.org / cs.CL updates on arXiv.org
ReFT: Reasoning with Reinforced Fine-Tuning
1 day, 16 hours ago |
arxiv.org
Exploring Defeasibility in Causal Reasoning
1 day, 16 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Quantitative Researcher – Algorithmic Research
@ Man Group | GB London Riverbank House
Software Engineering Expert
@ Sanofi | Budapest
Senior Bioinformatics Scientist
@ Illumina | US - Bay Area - Foster City
Senior Engineer - Generative AI Product Engineering (Remote-Eligible)
@ Capital One | McLean, VA
Graduate Assistant - Bioinformatics
@ University of Arkansas System | University of Arkansas at Little Rock
Senior AI-HPC Cluster Engineer
@ NVIDIA | US, CA, Santa Clara