all AI news
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Feb. 22, 2024, 5:48 a.m. | Vyas Raina, Adian Liusie, Mark Gales
cs.CL updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) are powerful zero-shot assessors and are increasingly used in real-world situations such as for written exams or benchmarking systems. Despite this, no existing work has analyzed the vulnerability of judge-LLMs against adversaries attempting to manipulate outputs. This work presents the first study on the adversarial robustness of assessment LLMs, where we search for short universal phrases that when appended to texts can deceive LLMs to provide high assessment scores. Experiments on …
abstract adversarial adversarial attacks arxiv assessment attacks benchmarking cs.cl exams judge language language models large language large language models llm llms robust systems type vulnerability work world zero-shot
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
1 day, 11 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
1 day, 11 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Applied Scientist
@ Microsoft | Redmond, Washington, United States
Data Analyst / Action Officer
@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States