all AI news
Unifying image-caption and image-classification datasets with prefix conditioning
Google AI Blog ai.googleblog.com
Pre-training visual language (VL) models on web-scale image-caption datasets has recently emerged as a powerful alternative to traditional pre-training on image classification data. Image-caption datasets are considered to be more “open-domain” because they contain broader scene types and vocabulary words, which result in models with strong performance in few- and zero-shot recognition tasks. However, images with fine-grained class descriptions can be rare, and …
classification cloud cloud ai computer vision cvpr data datasets image image-classification language multimodal learning natural language processing perception pre-training research researcher research scientist scale team training web