all AI news
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
March 7, 2024, 5:46 a.m. | Guozhang Li, Xinpeng Ding, De Cheng, Jie Li, Nannan Wang, Xinbo Gao
cs.CV updates on arXiv.org arxiv.org
Abstract: Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level annotation, explicit-supervision methods, i.e., generating pseudo-temporal boundaries for training, have achieved great success. However, data augmentations in these methods might disrupt critical temporal information, yielding poor pseudo boundaries. In this paper, we propose a new perspective that maintains the integrity of the original temporal content while …
abstract annotation annotations arxiv bridge cs.cv detection etc expand gap language language model large language large language model multimodal multimodal large language model struggle supervision temporal training type video
More from arxiv.org / cs.CV updates on arXiv.org
Multi-View Spectrogram Transformer for Respiratory Sound Classification
2 days, 22 hours ago |
arxiv.org
GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation
2 days, 22 hours ago |
arxiv.org
OTMatch: Improving Semi-Supervised Learning with Optimal Transport
2 days, 22 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Senior Applied Data Scientist
@ dunnhumby | London
Principal Data Architect - Azure & Big Data
@ MGM Resorts International | Home Office - US, NV