July 25, 2022, 1:12 a.m. | Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf

cs.CV updates on arXiv.org arxiv.org

We introduce a zero-shot video captioning method that employs two frozen
networks: the GPT-2 language model and the CLIP image-text matching model. The
matching score is used to steer the language model toward generating a sentence
that has a high average matching score to a subset of the video frames. Unlike
zero-shot image captioning methods, our work considers the entire sentence at
once. This is achieved by optimizing, during the generation process, part of
the prompt from scratch, by modifying …

arxiv captioning cv tokens video

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Lead Software Engineer - Artificial Intelligence, LLM

@ OpenText | Hyderabad, TG, IN

Lead Software Engineer- Python Data Engineer

@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom

Data Analyst (m/w/d)

@ Collaboration Betters The World | Berlin, Germany

Data Engineer, Quality Assurance

@ Informa Group Plc. | Boulder, CO, United States

Director, Data Science - Marketing

@ Dropbox | Remote - Canada