Web: http://arxiv.org/abs/2201.10654

Jan. 27, 2022, 2:10 a.m. | Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu

cs.CV updates on arXiv.org arxiv.org

Visual Question Answering (VQA) attracts much attention from both industry
and academia. As a multi-modality task, it is challenging since it requires not
only visual and textual understanding, but also the ability to align
cross-modality representations. Previous approaches extensively employ
entity-level alignments, such as the correlations between the visual regions
and their semantic labels, or the interactions across question words and object
features. These attempts aim to improve the cross-modality representations,
while ignoring their internal relations. Instead, we propose to …

arxiv cv question answering semantic

