all AI news
Exploring Diverse Methods in Visual Question Answering
April 23, 2024, 4:46 a.m. | Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian
cs.CV updates on arXiv.org arxiv.org
Abstract: This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more complex tasks. Secondly, autoencoder-based techniques focus on learning optimal embeddings for questions and images, achieving comparable results with GAN due to better ability on complex …
abstract adversarial aim arxiv attention attention mechanisms autoencoders cs.ai cs.cl cs.cv dataset diverse embeddings gan gans generate generative generative adversarial networks image improving inputs networks question question answering strategies study type visual vqa
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Codec Avatars Research Engineer
@ Meta | Pittsburgh, PA