Web: http://arxiv.org/abs/2201.10168

Jan. 26, 2022, 2:10 a.m. | Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim

cs.CV updates on arXiv.org arxiv.org

We present a new paradigm named explore-and-match for video grounding, which
aims to seamlessly unify two streams of video grounding methods: proposal-based
and proposal-free. To achieve this goal, we formulate video grounding as a set
prediction problem and design an end-to-end trainable Video Grounding
Transformer (VidGTR) that can utilize the architectural strengths of rich
contextualization and parallel decoding for set prediction. The overall
training is balanced by two key losses that play different roles, namely span
localization loss and set …

arxiv cv transformer video

