Zero-Shot Video Captioning with Evolving Pseudo-Tokens. (arXiv:2207.11100v1 [cs.CV]) | allainews.com

July 25, 2022, 1:12 a.m. | Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf

cs.CV updates on arXiv.org arxiv.org

We introduce a zero-shot video captioning method that employs two frozen
networks: the GPT-2 language model and the CLIP image-text matching model. The
matching score is used to steer the language model toward generating a sentence
that has a high average matching score to a subset of the video frames. Unlike
zero-shot image captioning methods, our work considers the entire sentence at
once. This is achieved by optimizing, during the generation process, part of
the prompt from scratch, by modifying …

arxiv captioning cv tokens video

More from arxiv.org / cs.CV updates on arXiv.org

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception 1 day, 9 hours ago | arxiv.org

agent arxiv autonomous cs.cl +8

Low-resolution Prior Equilibrium Network for CT Reconstruction 1 day, 9 hours ago | arxiv.org

abstract arxiv cs.cv deep learning +17

MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images 1 day, 9 hours ago | arxiv.org

abstract artifact arxiv cs.cv +16

Back to Basics: Fast Denoising Iterative Algorithm 1 day, 9 hours ago | arxiv.org

abstract algorithm arxiv basics +10

Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models 1 day, 9 hours ago | arxiv.org

abstract arxiv benefit clinicians +10

Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models 1 day, 9 hours ago | arxiv.org

abstract adversarial adversarial examples art +20

Methods and strategies for improving the novel view synthesis quality of neural radiation field 1 day, 9 hours ago | arxiv.org

abstract application arxiv attention +16

AffordanceLLM: Grounding Affordance from Vision Language Models 1 day, 9 hours ago | arxiv.org

arxiv cs.cv cs.ro language +3

DualFluidNet: an Attention-based Dual-pipeline Network for FLuid Simulation 1 day, 9 hours ago | arxiv.org

arxiv attention cs.cv cs.gr +4

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Lead Software Engineer - Artificial Intelligence, LLM

@ OpenText | Hyderabad, TG, IN

View on ai-jobs.net

Lead Software Engineer- Python Data Engineer

@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom

View on ai-jobs.net

Data Analyst (m/w/d)

@ Collaboration Betters The World | Berlin, Germany

View on ai-jobs.net

Data Engineer, Quality Assurance

@ Informa Group Plc. | Boulder, CO, United States

View on ai-jobs.net

Director, Data Science - Marketing

@ Dropbox | Remote - Canada

View on ai-jobs.net