Web: http://ai.googleblog.com/2022/09/pali-scaling-language-image-learning-in.html

Sept. 15, 2022, 7:16 p.m. | Google AI (noreply@blogger.com)

Google AI Blog googleblog.com

Posted by Xi Chen and Xiao Wang, Software Engineers, Google Research

Advanced language models (e.g., GPT, GLaM, PaLM and T5) have demonstrated diverse capabilities and achieved impressive results across tasks and languages by scaling up their number of parameters. Vision-language (VL) models can benefit from similar scaling to address many tasks, such as image captioning, visual question answering (VQA), object recognition, and in-context optical-character-recognition (OCR). Increasing the success rates for these practical tasks is important …

computer vision image language machine learning multimodal learning scaling

