Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs | allainews.com

April 16, 2024, 4:51 a.m. | Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.09971v1 Announce Type: new
Abstract: Large language models (LLMs) are susceptible to hallucination, which sparked a widespread effort to detect and prevent them. Recent work attempts to mitigate hallucinations by intervening in the model's computation during generation, using different setups and heuristics. Those works lack separation between different hallucination causes. In this work, we first introduce an approach for constructing datasets based on the model knowledge for detection and intervention methods in closed-book and open-book question-answering settings. We then characterize …

arxiv benchmarks cs.cl hallucinations llms type

More from arxiv.org / cs.CL updates on arXiv.org

Evaluating ChatGPT as a Recommender System: A Rigorous Approach 13 hours ago | arxiv.org

abstract arxiv chatgpt community +24

Gradable ChatGPT Translation Evaluation 13 hours ago | arxiv.org

abstract arxiv chatgpt cs.cl +17

Distortions in Judged Spatial Relations in Large Language Models 13 hours ago | arxiv.org

abstract apply arxiv benchmark +20

TinyLlama: An Open-Source Small Language Model 13 hours ago | arxiv.org

arxiv cs.ai cs.cl language +5

Large Language Models for Generative Information Extraction: A Survey 13 hours ago | arxiv.org

arxiv cs.cl extraction generative +9

Competition-Level Problems are Effective LLM Evaluators 13 hours ago | arxiv.org

abstract arxiv capabilities competition +19

GENEVA: GENErating and Visualizing branching narratives using LLMs 13 hours ago | arxiv.org

abstract arxiv creative cs.cl +19

Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster 13 hours ago | arxiv.org

abstract arxiv context context window +14

PACIT: Unlocking the Power of Examples for Better In-Context Instruction Tuning 13 hours ago | arxiv.org

abstract arxiv context cs.cl +17

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Real World Evidence Research Analyst

@ Novartis | Dublin (Novartis Global Service Center (NGSC))

View on ai-jobs.net

Senior DataOps Engineer

@ Winterthur Gas & Diesel AG | Winterthur, CH

View on ai-jobs.net