Long-context LLMs Struggle with Long In-context Learning | allainews.com

April 3, 2024, 4:47 a.m. | Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.02060v1 Announce Type: new
Abstract: Large Language Models (LLMs) have made significant strides in handling long sequences exceeding 32K tokens. However, their performance evaluation has largely been confined to metrics like perplexity and synthetic tasks, which may not fully capture their abilities in more nuanced, real-world scenarios. This study introduces a specialized benchmark (LIConBench) focusing on long in-context learning within the realm of extreme-label classification. We meticulously selected six datasets with a label range spanning 28 to 174 classes covering …

abstract arxiv benchmark context cs.ai cs.cl evaluation however in-context learning language language models large language large language models llms metrics performance perplexity struggle study synthetic tasks tokens type world

More from arxiv.org / cs.CL updates on arXiv.org

Biomedical knowledge graph-optimized prompt generation for large language models 22 hours ago | arxiv.org

abstract arxiv biomedical biomedicine +27

Primacy Effect of ChatGPT 22 hours ago | arxiv.org

arxiv chatgpt cs.ai cs.cl +2

Are Models Trained on Indian Legal Data Fair? 22 hours ago | arxiv.org

abstract advances applications artificial +27

Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT 22 hours ago | arxiv.org

abstract agent arxiv chatgpt +13

Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design 22 hours ago | arxiv.org

abstract agent ai-powered ai systems +21

Robot Detection System 1: Front-Following 22 hours ago | arxiv.org

abstract advantages arxiv cs.cl +14

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram 22 hours ago | arxiv.org

abstract annotation arxiv biomedical +12

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in … 22 hours ago | arxiv.org

abstract arxiv beyond cs.ai +15

From Text to Context: An Entailment Approach for News Stakeholder Classification 22 hours ago | arxiv.org

abstract actors articles arxiv +13

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net