Web: http://arxiv.org/abs/2209.10918

Sept. 23, 2022, 1:14 a.m. | Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

cs.CV updates on arXiv.org arxiv.org

Video temporal grounding (VTG) targets to localize temporal moments in an
untrimmed video according to a natural language (NL) description. Since
real-world applications provide a never-ending video stream, it raises demands
for temporal grounding for long-form videos, which leads to two major
challenges: (1) the long video length makes it difficult to process the entire
video without decreasing sample rate and leads to high computational burden;
(2) the accurate multi-modal alignment is more challenging as the number of
moment candidates …

alignment arxiv fine framework temporal video

