all AI news
Modular visual question answering via code generation
Google AI Blog ai.googleblog.com
Visual question answering (VQA) is a machine learning task that requires a model to answer a question about an image or a set of images. Conventional VQA approaches need a large amount of labeled training data consisting of thousands of human-annotated question-answer pairs associated with images. In recent years, advances in large-scale pre-training have led to the development of VQA methods that …
berkeley code code generation computer vision data google google research image images machine machine learning modular multimodal learning perception phd question answering research research scientist set team training training data uc berkeley video