all AI news
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering. (arXiv:2201.10654v1 [cs.CV])
Web: http://arxiv.org/abs/2201.10654
Jan. 27, 2022, 2:10 a.m. | Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu
cs.CV updates on arXiv.org arxiv.org
Visual Question Answering (VQA) attracts much attention from both industry
and academia. As a multi-modality task, it is challenging since it requires not
only visual and textual understanding, but also the ability to align
cross-modality representations. Previous approaches extensively employ
entity-level alignments, such as the correlations between the visual regions
and their semantic labels, or the interactions across question words and object
features. These attempts aim to improve the cross-modality representations,
while ignoring their internal relations. Instead, we propose to …
More from arxiv.org / cs.CV updates on arXiv.org
Latest AI/ML/Big Data Jobs
Director, Data Science (Advocacy & Nonprofit)
@ Civis Analytics | Remote
Data Engineer
@ Rappi | [CO] Bogotá
Data Scientist V, Marketplaces Personalization (Remote)
@ ID.me | United States (U.S.)
Product OPs Data Analyst (Flex/Remote)
@ Scaleway | Paris
Big Data Engineer
@ Risk Focus | Riga, Riga, Latvia
Internship Program: Machine Learning Backend
@ Nextail | Remote job