Instruction-following Evaluation through Verbalizer Manipulation | allainews.com

April 3, 2024, 4:47 a.m. | Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin

cs.CL updates on arXiv.org arxiv.org

arXiv:2307.10558v2 Announce Type: replace
Abstract: While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the …

abstract arxiv benchmarks cs.cl evaluation focus however imply instruction-tuned language language processing manipulation natural natural language natural language processing processing success tasks through training type

More from arxiv.org / cs.CL updates on arXiv.org

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring 22 hours ago | arxiv.org

abstract arxiv biases concerns +24

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks 22 hours ago | arxiv.org

arxiv building cs.ai cs.cl +5

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark 22 hours ago | arxiv.org

abstract analysis arxiv basic +28

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces 22 hours ago | arxiv.org

abstract ancestry arxiv authors +21

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification 22 hours ago | arxiv.org

arxiv classification cs.cl cs.cv +9

Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations 22 hours ago | arxiv.org

abstract art arxiv clip +17

Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Trafficking Text Corpora 22 hours ago | arxiv.org

abstract arxiv change climate +19

Large Language Models Show Human-like Social Desirability Biases in Survey Responses 22 hours ago | arxiv.org

abstract arxiv become behavior +25

Linearizing Large Language Models 22 hours ago | arxiv.org

arxiv cs.cl language language models +3

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net