CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter. (arXiv:2111.15162v2 [cs.CV] UPDATED) | allainews.com

Aug. 23, 2022, 1:16 a.m. | Bang Yang, Tong Zhang, Yuexian Zou

cs.CV updates on arXiv.org arxiv.org

For video captioning, "pre-training and fine-tuning" has become a de facto
paradigm, where ImageNet Pre-training (INP) is usually used to encode the video
content, then a task-oriented network is fine-tuned from scratch to cope with
caption generation. This paper first investigates the impact of the recently
proposed CLIP (Contrastive Language-Image Pre-training) on video captioning.
Through the empirical study on INP vs. CLIP, we identify the potential
deficiencies of INP and explore the key factors for accurate description
generation. The results …

arxiv captioning clip concept cv learning representation representation learning video

More from arxiv.org / cs.CV updates on arXiv.org

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM 16 hours ago | arxiv.org

arxiv benchmark cs.cv eess.iv +5

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI 16 hours ago | arxiv.org

arxiv brain cs.cv eess.iv +4

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation 16 hours ago | arxiv.org

arxiv box creative cs.ai +10

Spiking Structured State Space Model for Monaural Speech Enhancement 16 hours ago | arxiv.org

abstract arxiv challenges computational +17

Improved cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement 16 hours ago | arxiv.org

abstract arxiv challenges classification +18

Multilevel Geometric Optimization for Regularised Constrained Linear Inverse Problems 16 hours ago | arxiv.org

abstract arxiv box compute +7

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models 16 hours ago | arxiv.org

abstract arxiv capability consistent +18

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving 16 hours ago | arxiv.org

arxiv autonomous autonomous driving cs.cv +4

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces 16 hours ago | arxiv.org

abstract arxiv cs.cr cs.cv +10

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Lead Data Engineer

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

View on ai-jobs.net

Senior Machine Learning Engineer

@ TELUS | Vancouver, BC, CA

View on ai-jobs.net

CT Technologist - Ambulatory Imaging - PRN

@ Duke University | Morriville, NC, US, 27560

View on ai-jobs.net

BH Data Analyst

@ City of Philadelphia | Philadelphia, PA, United States

View on ai-jobs.net