all AI news
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
April 2, 2024, 7:49 p.m. | Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
cs.CV updates on arXiv.org arxiv.org
Abstract: Video grounding aims to localize a spatio-temporal section in a video corresponding to an input text query. This paper addresses a critical limitation in current video grounding methodologies by introducing an Open-Vocabulary Spatio-Temporal Video Grounding task. Unlike prevalent closed-set approaches that struggle with open-vocabulary scenarios due to limited training data and predefined vocabularies, our model leverages pre-trained representations from foundational spatial grounding models. This empowers it to effectively bridge the semantic gap between natural language …
abstract arxiv cs.cv current paper query set struggle temporal text training type video
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South