Fact or Fiction? NOCHA: A New Benchmark for Evaluating Long-Context Reasoning in LLMs | allainews.com

June 28, 2024, 6:45 a.m. | /u/ai-lover

machinelearningnews www.reddit.com

Researchers from UMass Amherst, Allen Institute for AI, and Princeton University have introduced a new evaluation methodology called NOCHA (Narrative Open-Contextualized Human Annotation). This approach is designed to assess the performance of long-context language models more accurately. NOCHA involves collecting minimal narrative pairs, where one claim is true, and the other is false, both written by readers of books.

The NOCHA methodology involves collecting narrative minimal pairs from recently published fictional books. Annotators familiar with these books generate pairs of …

allen allen institute allen institute for ai annotation benchmark claim context evaluation fiction human institute language language models llms machinelearningnews methodology narrative performance princeton university reasoning researchers university

More from www.reddit.com / machinelearningnews

Two AI Releases SUTRA: A Multilingual AI Model Improving Language Processing in Over 30 Languages … 20 hours ago | www.reddit.com

ai model asian improving language +9

CharXiv: A Comprehensive Evaluation Suite Advancing Multimodal Large Language Models Through Realistic Chart Understanding Benchmarks 1 day, 20 hours ago | www.reddit.com

arxiv assessment benchmarks chart +20

Goodbye LoRa, hello DoRa 2 days, 8 hours ago | www.reddit.com

diffusion dora etc hello +7

Meta AI Introduces Meta LLM Compiler: A State-of-the-Art LLM that Builds upon Code Llama with … 2 days, 8 hours ago | www.reddit.com

art code code llama compiler +17

Fact or Fiction? NOCHA: A New Benchmark for Evaluating Long-Context Reasoning in LLMs 2 days, 18 hours ago | www.reddit.com

allen allen institute allen institute for ai annotation +18

Pinecone announces instant RAG assistant service with API support 3 days, 8 hours ago | www.reddit.com

api assistant instant machinelearningnews +4

Google Releases Gemma 2 Series Models: Advanced LLM Models in 9B and 27B Sizes Trained … 3 days, 8 hours ago | www.reddit.com

advanced attention distillation gemma +12

Hugging Face Releases Open LLM Leaderboard 2: A Major Upgrade Featuring Tougher Benchmarks, Fairer Scoring, … 3 days, 9 hours ago | www.reddit.com

arc began benchmark benchmarks +15

GraphReader: A Graph-based AI Agent System Designed to Handle Long Texts by Structuring them into … 3 days, 17 hours ago | www.reddit.com

agent alibaba alibaba group challenges +16

Junior Senior Reliability Engineer

@ NielsenIQ | Bogotá, Colombia

View on ai-jobs.net

[Job - 15712] Vaga Afirmativa para Mulheres - QA (Automation), SR

@ CI&T | Brazil

View on ai-jobs.net

Production Reliability Engineer, Trade Desk

@ Jump Trading | Sydney, Australia

View on ai-jobs.net

Senior Process Engineer, Prenatal

@ BillionToOne | Union City and Menlo Park, CA

View on ai-jobs.net

Senior Scientist, Sustainability Science and Innovation

@ Microsoft | Redmond, Washington, United States

View on ai-jobs.net

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India

View on ai-jobs.net