ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding | allainews.com

Feb. 6, 2024, 5:55 a.m. | Guojun Wu

cs.CL updates on arXiv.org arxiv.org

Most multilingual vision-and-language (V&L) research aims to accomplish multilingual and multimodal capabilities within one model. However, the scarcity of multilingual captions for images has hindered the development. To overcome this obstacle, we propose ICU, Image Caption Understanding, which divides a V&L task into two stages: a V&L model performs image captioning in English, and a multilingual language model (mLM), in turn, takes the caption as the alt text and performs cross-lingual language understanding. The burden of multilingual processing is lifted …

capabilities captioning captions cs.cl development image images language language understanding modeling multilingual multimodal multimodal capabilities one model research tasks understanding vision vision-and-language

More from arxiv.org / cs.CL updates on arXiv.org

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators 13 hours ago | arxiv.org

abstract accelerators architectures arxiv +13

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning 13 hours ago | arxiv.org

arxiv benchmark chinese cs.ai +8

Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models 13 hours ago | arxiv.org

abstract advances arxiv cs.cl +16

An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT 13 hours ago | arxiv.org

abstract arxiv chatgpt communication +14

Commentary Generation from Data Records of Multiplayer Strategy Esports Game 13 hours ago | arxiv.org

abstract arxiv audience become +20

Honeyfile Camouflage: Hiding Fake Files in Plain Sight 13 hours ago | arxiv.org

abstract arxiv challenge cosine +13

You Only Cache Once: Decoder-Decoder Architectures for Language Models 13 hours ago | arxiv.org

architectures arxiv cache cs.cl +4

Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge 13 hours ago | arxiv.org

abstract arxiv computing concerns +23

LLMs with Personalities in Multi-issue Negotiation Games 13 hours ago | arxiv.org

abstract agents ai agents arxiv +26

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net