all AI news
Scaling vision transformers to 22 billion parameters
Google AI Blog ai.googleblog.com
Large Language Models (LLMs) like PaLM or GPT-3 showed that scaling transformers to hundreds of billions of parameters improves performance and unlocks emergent abilities. The biggest dense models for image understanding, however, have reached only 4 billion parameters, despite research indicating that promising multimodal models like PaLI continue to benefit from scaling vision models alongside their language counterparts. Motivated by this, and the results from scaling LLMs, we …
benefit computer vision engineers google google research gpt gpt-3 image journey language language models large language models llms machine learning multimodal multimodal learning multimodal models next palm performance research scaling software software engineers transformer transformers understanding vision vision research