http://arxiv.org/abs/2209.06794

Sept. 19, 2022

cs.CL updates on arXiv.org arxiv.org

Effective scaling and a flexible task interface enable large language models
to excel at many tasks. PaLI (Pathways Language and Image model) extends this
approach to the joint modeling of language and vision. PaLI generates text
based on visual and textual inputs, and with this interface performs many
vision, language, and multimodal tasks, in many languages. To train PaLI, we
make use of large pretrained encoder-decoder language models and Vision
Transformers (ViTs). This allows us to capitalize on their existing …

arxiv image language

