all AI news
Boosting Video-Text Retrieval with Explicit High-Level Semantics. (arXiv:2208.04215v2 [cs.CV] UPDATED)
Aug. 10, 2022, 1:12 a.m. | Haoran Wang, Di Xu, Dongliang He, Fu Li, Zhong Ji, Jungong Han, Errui Ding
cs.CV updates on arXiv.org arxiv.org
Video-text retrieval (VTR) is an attractive yet challenging task for
multi-modal understanding, which aims to search for relevant video (text) given
a query (video). Existing methods typically employ completely heterogeneous
visual-textual information to align video and text, whilst lacking the
awareness of homogeneous high-level semantic information residing in both
modalities. To fill this gap, in this work, we propose a novel
visual-linguistic aligning model named HiSE for VTR, which improves the
cross-modal representation by incorporating explicit high-level semantics.
First, we …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Applied Scientist, Control Stack, AWS Center for Quantum Computing
@ Amazon.com | Pasadena, California, USA
Specialist Marketing with focus on ADAS/AD f/m/d
@ AVL | Graz, AT
Machine Learning Engineer, PhD Intern
@ Instacart | United States - Remote
Supervisor, Breast Imaging, Prostate Center, Ultrasound
@ University Health Network | Toronto, ON, Canada
Senior Manager of Data Science (Recommendation Science)
@ NBCUniversal | New York, NEW YORK, United States