all AI news
On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering. (arXiv:2201.03965v1 [cs.CV])
Jan. 12, 2022, 2:10 a.m. | Ankur Sikarwar, Gabriel Kreiman
cs.LG updates on arXiv.org arxiv.org
In recent years, multi-modal transformers have shown significant progress in
Vision-Language tasks, such as Visual Question Answering (VQA), outperforming
previous architectures by a considerable margin. This improvement in VQA is
often attributed to the rich interactions between vision and language streams.
In this work, we investigate the efficacy of co-attention transformer layers in
helping the network focus on relevant regions while answering the question. We
generate visual attention maps using the question-conditioned image attention
scores in these co-attention layers. We …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Social Insights & Data Analyst (Freelance)
@ Media.Monks | Jakarta
Cloud Data Engineer
@ Arkatechture | Portland, ME, USA