Feb. 12, 2024, 5:43 a.m. | Jaemin Cho Yushi Hu Roopal Garg Peter Anderson Ranjay Krishna Jason Baldridge Mohit Bansal Jor

cs.LG updates on arXiv.org arxiv.org

Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model are consistent with the prompt-based answers. This kind of evaluation is naturally dependent on the quality of the underlying QG and QA models. We …

cs.ai cs.cl cs.cv cs.lg evaluation fine-grained foundational models generate graph image image generation images prompt question questions reliability set text text-image text-to-image the prompt

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Senior Analyst-Data Analysis

@ Tesco Bengaluru | Bengaluru, India

Data Engineer - Senior Associate

@ PwC | Brussels

People Data Analyst

@ Version 1 | London, United Kingdom

Senior Data Scientist

@ Palta | Simple Cyprus or remote