Web: http://arxiv.org/abs/2201.10656

Jan. 27, 2022, 2:10 a.m. | Peixi Xiong, Yilin Shen, Hongxia Jin

cs.CV updates on arXiv.org arxiv.org

Learning to answer visual questions is a challenging task since the
multi-modal inputs are within two feature spaces. Moreover, reasoning in visual
question answering requires the model to understand both image and question,
and align them in the same space, rather than simply memorize statistics about
the question-answer pairs. Thus, it is essential to find component connections
between different modalities and within each modality to achieve better
attention. Previous works learned attention weights directly on the features.
However, the improvement …

arxiv cv question answering

More from arxiv.org / cs.CV updates on arXiv.org

Machine Learning Product Manager (Europe, Remote)

@ FreshBooks | Germany

Field Operations and Data Engineer, ADAS

@ Lucid Motors | Newark, CA

Machine Learning Engineer - Senior

@ Novetta | Reston, VA

Analytics Engineer

@ ThirdLove | Remote

Senior Machine Learning Infrastructure Engineer - Safety

@ Discord | San Francisco, CA or Remote

Internship, Data Scientist

@ Everstream Analytics | United States (Remote)