April 3, 2024, 4:47 a.m. | Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.02060v1 Announce Type: new
Abstract: Large Language Models (LLMs) have made significant strides in handling long sequences exceeding 32K tokens. However, their performance evaluation has largely been confined to metrics like perplexity and synthetic tasks, which may not fully capture their abilities in more nuanced, real-world scenarios. This study introduces a specialized benchmark (LIConBench) focusing on long in-context learning within the realm of extreme-label classification. We meticulously selected six datasets with a label range spanning 28 to 174 classes covering …

abstract arxiv benchmark context cs.ai cs.cl evaluation however in-context learning language language models large language large language models llms metrics performance perplexity struggle study synthetic tasks tokens type world

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US