all AI news
Evaluating Large Language Models Using Contrast Sets: An Experimental Approach
April 3, 2024, 4:42 a.m. | Manish Sanwal
cs.LG updates on arXiv.org arxiv.org
Abstract: In the domain of Natural Language Inference (NLI), especially in tasks involving the classification of multiple input texts, the Cross-Entropy Loss metric is widely employed as a standard for error measurement. However, this metric falls short in effectively evaluating a model's capacity to understand language entailments. In this study, we introduce an innovative technique for generating a contrast set for the Stanford Natural Language Inference (SNLI) dataset. Our strategy involves the automated substitution of verbs, …
abstract arxiv capacity classification contrast cross-entropy cs.ai cs.cl cs.lg domain entropy error experimental however inference language language models large language large language models loss measurement multiple natural natural language standard tasks type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Scientist
@ ITE Management | New York City, United States