all AI news
Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models
May 7, 2024, 4:43 a.m. | Tobias Groot, Matias Valdenegro-Toro
cs.LG updates on arXiv.org arxiv.org
Abstract: Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting. We propose the new Japanese Uncertain Scenes (JUS) dataset, aimed at testing VLM capabilities via difficult queries and object …
abstract arxiv cs.cl cs.cv cs.lg evaluation generate gpt gpt-3 gpt-3.5 gpt4 human human-like images key language language models large language llama2 llms palm palm 2 paper reliability text type uncertainty vision vision-language vision-language models vlms
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US