May 16, 2022, 1:10 a.m. | Zenan Xu, Wanjun Zhong, Qinliang Su, Zijing Ou, Fuwei Zhang

cs.CV updates on arXiv.org arxiv.org

A key challenge in video question answering is how to realize the cross-modal
semantic alignment between textual concepts and corresponding visual objects.
Existing methods mostly seek to align the word representations with the video
regions. However, word representations are often not able to convey a complete
description of textual concepts, which are in general described by the
compositions of certain words. To address this issue, we propose to first build
a syntactic dependency tree for each question with an off-the-shelf …

arxiv cv hypergraph modeling question answering semantic video

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Healthcare Data Modeler/Data Architect - REMOTE

@ Perficient | United States

Data Analyst – Sustainability, Green IT

@ H&M Group | Stockholm, Sweden

RWE Data Analyst

@ Sanofi | Hyderabad

Machine Learning Engineer

@ JPMorgan Chase & Co. | Jersey City, NJ, United States