Object-aware Video-language Pre-training for Retrieval. (arXiv:2112.00656v6 [cs.CV] UPDATED) | allainews.com

May 19, 2022, 1:11 a.m. | Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou

cs.CL updates on arXiv.org arxiv.org

Recently, by introducing large-scale dataset and strong transformer network,
video-language pre-training has shown great success especially for retrieval.
Yet, existing video-language transformer models do not explicitly fine-grained
semantic align. In this work, we present Object-aware Transformers, an
object-centric approach that extends video-language transformer to incorporate
object representations. The key idea is to leverage the bounding boxes and
object tags to guide the training process. We evaluate our model on three
standard sub-tasks of video-text matching on four widely used benchmarks. …

arxiv cv language pre-training retrieval training video

More from arxiv.org / cs.CL updates on arXiv.org

LLMs for Science: Usage for Code Generation and Data Analysis 6 hours ago | arxiv.org

abstract analysis arxiv become +26

VAL: Interactive Task Learning with GPT Dialog Parsing 6 hours ago | arxiv.org

abstract acquisition arxiv box +22

Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and … 6 hours ago | arxiv.org

abstract arxiv assessment automated +23

Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss 6 hours ago | arxiv.org

abstract arxiv binary cs.ai +13

DBCopilot: Scaling Natural Language Querying to Massive Databases 6 hours ago | arxiv.org

abstract advances arxiv challenges +31

ARN: Analogical Reasoning on Narratives 6 hours ago | arxiv.org

abstract analogy arxiv cognitive +17

Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical … 6 hours ago | arxiv.org

abstract arxiv biomedical building +24

Learning the meanings of function words from grounded language using a visual question answering model 6 hours ago | arxiv.org

abstract acquisition arxiv children +17

RETVec: Resilient and Efficient Text Vectorizer 6 hours ago | arxiv.org

arxiv cs.ai cs.cl resilient +2

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst

@ Aviva | UK - Norwich - Carrara - 1st Floor

View on ai-jobs.net

Werkstudent im Bereich Performance Engineering mit Computer Vision (w/m/div.) - anteilig remote

@ Bosch Group | Stuttgart, Lollar, Germany

View on ai-jobs.net

Applied Research Scientist - NLP (Senior)

@ Snorkel AI | Hybrid / San Francisco, CA

View on ai-jobs.net

Associate Principal Engineer, Machine Learning

@ Nagarro | Remote, India

View on ai-jobs.net