all AI news
HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models
April 9, 2024, 4:43 a.m. | Yimu Wang, Shuai Yuan, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu
cs.LG updates on arXiv.org arxiv.org
Abstract: While recent progress in video-text retrieval has been driven by the exploration of powerful model architectures and training strategies, the representation learning ability of video-text retrieval models is still limited due to low-quality and scarce training data annotations. To address this issue, we present a novel video-text learning paradigm, HaVTR, which augments video and text data to learn more generalized features. Specifically, we first adopt a simple augmentation method, which generates self-similar data by randomly …
abstract annotations architectures arxiv augmentation cs.cl cs.cv cs.ir cs.lg data exploration foundation improving issue low progress quality representation representation learning retrieval strategies text through training training data type video
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Machine Learning Engineer
@ Samsara | Canada - Remote