all AI news
Taxonomy-based CheckList for Large Language Model Evaluation
Feb. 20, 2024, 5:50 a.m. | Damin Zhang
cs.CL updates on arXiv.org arxiv.org
Abstract: As large language models (LLMs) have been used in many downstream tasks, the internal stereotypical representation may affect the fairness of the outputs. In this work, we introduce human knowledge into natural language interventions and study pre-trained language models' (LMs) behaviors within the context of gender bias. Inspired by CheckList behavioral testing, we present a checklist-style task that aims to probe and quantify LMs' unethical behaviors through question-answering (QA). We design three comparison studies to …
abstract arxiv bias checklist context cs.cl evaluation fairness gender gender bias human knowledge language language model language models large language large language model large language models llms lms natural natural language representation study tasks taxonomy type work
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
1 day, 17 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
1 day, 17 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Intern Large Language Models Planning (f/m/x)
@ BMW Group | Munich, DE
Data Engineer Analytics
@ Meta | Menlo Park, CA | Remote, US