Aug. 10, 2022, 1:12 a.m. | Haoran Wang, Di Xu, Dongliang He, Fu Li, Zhong Ji, Jungong Han, Errui Ding

cs.CV updates on arXiv.org arxiv.org

Video-text retrieval (VTR) is an attractive yet challenging task for
multi-modal understanding, which aims to search for relevant video (text) given
a query (video). Existing methods typically employ completely heterogeneous
visual-textual information to align video and text, whilst lacking the
awareness of homogeneous high-level semantic information residing in both
modalities. To fill this gap, in this work, we propose a novel
visual-linguistic aligning model named HiSE for VTR, which improves the
cross-modal representation by incorporating explicit high-level semantics.
First, we …

arxiv boosting cv retrieval semantics text video

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Applied Scientist, Control Stack, AWS Center for Quantum Computing

@ Amazon.com | Pasadena, California, USA

Specialist Marketing with focus on ADAS/AD f/m/d

@ AVL | Graz, AT

Machine Learning Engineer, PhD Intern

@ Instacart | United States - Remote

Supervisor, Breast Imaging, Prostate Center, Ultrasound

@ University Health Network | Toronto, ON, Canada

Senior Manager of Data Science (Recommendation Science)

@ NBCUniversal | New York, NEW YORK, United States