April 2, 2024, 7:49 p.m. | WonJun Moon, Sangeek Hyun, SuBeen Lee, Jae-Pil Heo

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.08835v3 Announce Type: replace
Abstract: Video Temporal Grounding is to identify specific moments or highlights from a video corresponding to textual descriptions. Typical approaches in temporal grounding treat all video clips equally during the encoding process regardless of their semantic relevance with the text query. Therefore, we propose Correlation-Guided DEtection TRansformer(CG-DETR), exploring to provide clues for query-associated video clips within the cross-modal attention. First, we design an adaptive cross-attention with dummy tokens. Dummy tokens conditioned by text query take portions …

abstract arxiv correlation cs.cv detection encoding highlights identify moments process query representation representation learning semantic temporal text textual type video

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA