Autonomous visual information seeking with large language models | allainews.com

Aug. 18, 2023, 6:28 p.m. | Google AI (noreply@blogger.com)

Google AI Blog ai.googleblog.com

Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team

There has been great progress towards adapting large language models (LLMs) to accommodate multimodal inputs for tasks including image captioning, visual question answering (VQA), and open vocabulary recognition. Despite such achievements, current state-of-the-art visual language models (VLMs) perform inadequately on visual information seeking datasets, such as Infoseek and OK-VQA, where external knowledge is required to answer the questions.

Examples of visual …

art autonomous captioning computer vision current google google research image information language language models large language large language models llms machine learning multimodal multimodal learning perception progress question answering recognition research researcher research scientist state tasks team

More from ai.googleblog.com / Google AI Blog

Generative AI to quantify uncertainty in weather forecasting 1 month, 1 week ago | ai.googleblog.com

climate decisions engineer example +17

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks 1 month, 1 week ago | ai.googleblog.com

bayesian data economic engineer +23

Computer-aided diagnosis for lung cancer screening 1 month, 2 weeks ago | ai.googleblog.com

cancer cancer screening computer diagnosis +16

Using AI to expand global access to reliable flood forecasts 1 month, 2 weeks ago | ai.googleblog.com

billion disaster engineering environment +13

ScreenAI: A visual language model for UI and visually-situated language understanding 1 month, 2 weeks ago | ai.googleblog.com

charts communication design diagrams +24

SCIN: A new resource for representative dermatology images 1 month, 2 weeks ago | ai.googleblog.com

crowd-sourcing dataset datasets dermatology +14

MELON: Reconstructing 3D objects from images with unknown poses 1 month, 2 weeks ago | ai.googleblog.com

3d objects capacity computer vision engineer +16

HEAL: A framework for health equity assessment of machine learning performance 1 month, 3 weeks ago | ai.googleblog.com

assessment clinical core differences +17

Cappy: Outperforming and boosting large multi-task language models with a small scorer 1 month, 3 weeks ago | ai.googleblog.com

boosting engineers framework google +25

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Alternance DATA/AI Engineer (H/F)

@ SQLI | Le Grand-Quevilly, France

View on ai-jobs.net