April 17, 2024, 4:42 a.m. | Elham J. Barezi, Parisa Kordjamshidi

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.10226v1 Announce Type: cross
Abstract: We analyze knowledge-based visual question answering, for which given a question, the models need to ground it into the visual modality and retrieve the relevant knowledge from a given large knowledge base (KB) to be able to answer. Our analysis has two folds, one based on designing neural architectures and training them from scratch, and another based on large pre-trained language models (LLMs). Our research questions are: 1) Can we effectively augment models by explicit …

abstract analysis analyze arxiv cs.ai cs.cl cs.cv cs.lg gap knowledge knowledge base question question answering reasoning type visual

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town