Zero-Shot Video Captioning with Evolving Pseudo-Tokens. (arXiv:2207.11100v2 [cs.CV] UPDATED) | allainews.com

July 29, 2022, 1:12 a.m. | Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf

cs.CV updates on arXiv.org arxiv.org

We introduce a zero-shot video captioning method that employs two frozen
networks: the GPT-2 language model and the CLIP image-text matching model. The
matching score is used to steer the language model toward generating a sentence
that has a high average matching score to a subset of the video frames. Unlike
zero-shot image captioning methods, our work considers the entire sentence at
once. This is achieved by optimizing, during the generation process, part of
the prompt from scratch, by modifying …

arxiv captioning cv tokens video

More from arxiv.org / cs.CV updates on arXiv.org

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM 4 hours ago | arxiv.org

arxiv benchmark cs.cv eess.iv +5

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI 4 hours ago | arxiv.org

arxiv brain cs.cv eess.iv +4

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation 4 hours ago | arxiv.org

arxiv box creative cs.ai +10

Spiking Structured State Space Model for Monaural Speech Enhancement 4 hours ago | arxiv.org

abstract arxiv challenges computational +17

Improved cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement 4 hours ago | arxiv.org

abstract arxiv challenges classification +18

Multilevel Geometric Optimization for Regularised Constrained Linear Inverse Problems 4 hours ago | arxiv.org

abstract arxiv box compute +7

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models 4 hours ago | arxiv.org

abstract arxiv capability consistent +18

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving 4 hours ago | arxiv.org

arxiv autonomous autonomous driving cs.cv +4

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces 4 hours ago | arxiv.org

abstract arxiv cs.cr cs.cv +10

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Principal Engineer, Deep Learning

@ Outrider | Remote

View on ai-jobs.net

Data Analyst (Bangkok based, relocation provided)

@ Agoda | Bangkok (Central World Office)

View on ai-jobs.net

Data Scientist II

@ MoEngage | Bengaluru

View on ai-jobs.net

Machine Learning Engineer

@ Sika AG | Welwyn Garden City, United Kingdom

View on ai-jobs.net