March 19, 2024, 4:49 a.m. | Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng, Jian-Fang Hu

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.11463v1 Announce Type: new
Abstract: Video Paragraph Grounding (VPG) is an emerging task in video-language understanding, which aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video. However, existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire. In this work, we introduce and explore Weakly-Supervised Video Paragraph Grounding (WSVPG) to eliminate the need of temporal annotations. Different from previous weakly-supervised grounding frameworks based on …

abstract alignment arxiv cs.cv however labels language language understanding multiple regression relations semantic temporal type understanding video weakly-supervised

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York