all AI news
Unsupervised Open-Vocabulary Object Localization in Videos
June 27, 2024, 4:47 a.m. | Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele
cs.CV updates on arXiv.org arxiv.org
Abstract: In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. We propose a method that first localizes objects in videos via an object-centric approach with slot attention and then assigns text to the obtained slots. The latter is achieved by an unsupervised way to read localized semantic information from the pre-trained CLIP model. The resulting video object localization is entirely …
abstract advances arxiv attention cs.cv improvements language language models localization object objects paper replace representation representation learning show text type unsupervised via video videos vision vision-language vision-language models
More from arxiv.org / cs.CV updates on arXiv.org
PlaNet-S: Automatic Semantic Segmentation of Placenta
1 day, 23 hours ago |
arxiv.org
Continuous 3D Myocardial Motion Tracking via Echocardiography
1 day, 23 hours ago |
arxiv.org
Optimal Transport Aggregation for Visual Place Recognition
1 day, 23 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
VP, Enterprise Applications
@ Blue Yonder | Scottsdale
Data Scientist - Moloco Commerce Media
@ Moloco | Redwood City, California, United States
Senior Backend Engineer (New York)
@ Kalepa | New York City. Hybrid
Senior Backend Engineer (USA)
@ Kalepa | New York City. Remote US.
Senior Full Stack Engineer (USA)
@ Kalepa | New York City. Remote US.
Senior Full Stack Engineer (New York)
@ Kalepa | New York City., Hybrid