On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering. (arXiv:2201.03965v1 [cs.CV]) | allainews.com

Jan. 12, 2022, 2:10 a.m. | Ankur Sikarwar, Gabriel Kreiman

cs.LG updates on arXiv.org arxiv.org

In recent years, multi-modal transformers have shown significant progress in
Vision-Language tasks, such as Visual Question Answering (VQA), outperforming
previous architectures by a considerable margin. This improvement in VQA is
often attributed to the rich interactions between vision and language streams.
In this work, we investigate the efficacy of co-attention transformer layers in
helping the network focus on relevant regions while answering the question. We
generate visual attention maps using the question-conditioned image attention
scores in these co-attention layers. We …

arxiv attention cv transformer

More from arxiv.org / cs.LG updates on arXiv.org

PPNet: A Two-Stage Neural Network for End-to-end Path Planning 20 hours ago | arxiv.org

abstract arxiv cs.ai cs.lg +14

Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections 20 hours ago | arxiv.org

abstract arxiv cs.ai cs.dc +16

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks 20 hours ago | arxiv.org

abstract architecture arxiv context +23

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting 20 hours ago | arxiv.org

abstract applications arxiv cs.ar +18

A Single-Loop Algorithm for Decentralized Bilevel Optimization 20 hours ago | arxiv.org

abstract algorithm applications arxiv +13

Watch Out! Simple Horizontal Class Backdoors Can Trivially Evade Defenses 20 hours ago | arxiv.org

abstract arxiv attacks backdoor +13

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples 20 hours ago | arxiv.org

abstract alpha arxiv cs.cr +16

CLEANing Cygnus A deep and fast with R2D2 20 hours ago | arxiv.org

abstract arxiv astronomy astro-ph.im +17

Feature Imitating Networks Enhance The Performance, Reliability And Speed Of Deep Learning On Biomedical Image … 20 hours ago | arxiv.org

abstract arxiv biomedical cs.cv +21

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Social Insights & Data Analyst (Freelance)

@ Media.Monks | Jakarta

View on ai-jobs.net

Cloud Data Engineer

@ Arkatechture | Portland, ME, USA

View on ai-jobs.net