Feb. 22, 2024, 5:48 a.m. | Vyas Raina, Adian Liusie, Mark Gales

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.14016v1 Announce Type: new
Abstract: Large Language Models (LLMs) are powerful zero-shot assessors and are increasingly used in real-world situations such as for written exams or benchmarking systems. Despite this, no existing work has analyzed the vulnerability of judge-LLMs against adversaries attempting to manipulate outputs. This work presents the first study on the adversarial robustness of assessment LLMs, where we search for short universal phrases that when appended to texts can deceive LLMs to provide high assessment scores. Experiments on …

abstract adversarial adversarial attacks arxiv assessment attacks benchmarking cs.cl exams judge language language models large language large language models llm llms robust systems type vulnerability work world zero-shot

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Applied Scientist

@ Microsoft | Redmond, Washington, United States

Data Analyst / Action Officer

@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States