Unsupervised Open-Vocabulary Object Localization in Videos | allainews.com

June 27, 2024, 4:47 a.m. | Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele

cs.CV updates on arXiv.org arxiv.org

arXiv:2309.09858v2 Announce Type: replace
Abstract: In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. We propose a method that first localizes objects in videos via an object-centric approach with slot attention and then assigns text to the obtained slots. The latter is achieved by an unsupervised way to read localized semantic information from the pre-trained CLIP model. The resulting video object localization is entirely …

abstract advances arxiv attention cs.cv improvements language language models localization object objects paper replace representation representation learning show text type unsupervised via video videos vision vision-language vision-language models

More from arxiv.org / cs.CV updates on arXiv.org

PlaNet-S: Automatic Semantic Segmentation of Placenta 1 day, 23 hours ago | arxiv.org

abstract architectures arxiv automated +15

FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model 1 day, 23 hours ago | arxiv.org

abstract arxiv cs.cv current +20

Continuous 3D Myocardial Motion Tracking via Echocardiography 1 day, 23 hours ago | arxiv.org

abstract arxiv clinical continuous +17

Optimal Transport Aggregation for Visual Place Recognition 1 day, 23 hours ago | arxiv.org

aggregation arxiv cs.cv recognition +4

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning 1 day, 23 hours ago | arxiv.org

abstract adapter agents arxiv +22

AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation 1 day, 23 hours ago | arxiv.org

abstract applications arxiv automated +23

LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scans 1 day, 23 hours ago | arxiv.org

3d reconstruction abstract acquisition analysis +10

ALMA: a mathematics-driven approach for determining tuning parameters in generalized LASSO problems, with applications to … 1 day, 23 hours ago | arxiv.org

abstract acquisition applications artifacts +19

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions 1 day, 23 hours ago | arxiv.org

abstract agents arxiv cs.ai +21

VP, Enterprise Applications

@ Blue Yonder | Scottsdale

View on ai-jobs.net

Data Scientist - Moloco Commerce Media

@ Moloco | Redwood City, California, United States

View on ai-jobs.net

Senior Backend Engineer (New York)

@ Kalepa | New York City. Hybrid

View on ai-jobs.net

Senior Backend Engineer (USA)

@ Kalepa | New York City. Remote US.

View on ai-jobs.net

Senior Full Stack Engineer (USA)

@ Kalepa | New York City. Remote US.

View on ai-jobs.net

Senior Full Stack Engineer (New York)

@ Kalepa | New York City., Hybrid

View on ai-jobs.net