May 19, 2022, 1:11 a.m. | Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou

cs.CL updates on arXiv.org arxiv.org

Recently, by introducing large-scale dataset and strong transformer network,
video-language pre-training has shown great success especially for retrieval.
Yet, existing video-language transformer models do not explicitly fine-grained
semantic align. In this work, we present Object-aware Transformers, an
object-centric approach that extends video-language transformer to incorporate
object representations. The key idea is to leverage the bounding boxes and
object tags to guide the training process. We evaluate our model on three
standard sub-tasks of video-text matching on four widely used benchmarks. …

arxiv cv language pre-training retrieval training video

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst

@ Aviva | UK - Norwich - Carrara - 1st Floor

Werkstudent im Bereich Performance Engineering mit Computer Vision (w/m/div.) - anteilig remote

@ Bosch Group | Stuttgart, Lollar, Germany

Applied Research Scientist - NLP (Senior)

@ Snorkel AI | Hybrid / San Francisco, CA

Associate Principal Engineer, Machine Learning

@ Nagarro | Remote, India