Web: http://arxiv.org/abs/2201.11316

Jan. 28, 2022, 2:11 a.m. | Moyuru Yamada, Vanessa D'Amario, Kentaro Takemoto, Xavier Boix, Tomotake Sasaki

cs.LG updates on arXiv.org arxiv.org

Transformer-based models achieve great performance on Visual Question
Answering (VQA). However, when we evaluate them on systematic generalization,
i.e., handling novel combinations of known concepts, their performance
degrades. Neural Module Networks (NMNs) are a promising approach for systematic
generalization that consists on composing modules, i.e., neural networks that
tackle a sub-task. Inspired by Transformers and NMNs, we propose Transformer
Module Network (TMN), a novel Transformer-based model for VQA that dynamically
composes modules into a question-specific Transformer network. TMNs achieve
state-of-the-art …

arxiv cv networks question answering transformer

More from arxiv.org / cs.LG updates on arXiv.org

Data Architect – Public Sector Health Data Architect, WWPS

@ Amazon.com | US, VA, Virtual Location - Virginia

[Job 8224] Data Engineer - Developer Senior

@ CI&T | Brazil

Software Engineer, Machine Learning, Planner/Behavior Prediction

@ Nuro, Inc. | Mountain View, California (HQ)

Lead Data Scientist

@ Inspectorio | Ho Chi Minh City, Ho Chi Minh City, Vietnam - Remote

Data Engineer

@ Craftable | Portugal - Remote

Sr. Data Scientist, Ads Marketplace Analytics

@ Reddit | Remote - United States