all AI news
Modular visual question answering via code generation
Google AI Blog ai.googleblog.com
Visual question answering (VQA) is a machine learning task that requires a model to answer a question about an image or a set of images. Conventional VQA approaches need a large amount of labeled training data consisting of thousands of human-annotated question-answer pairs associated with images. In recent years, advances in large-scale pre-training have led to the development of VQA methods that …
code code generation computer vision data google google research image images machine machine learning modular multimodal learning perception phd question answering research set team training training data uc berkeley video