VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning. (arXiv:2206.12972v2 [cs.CV] UPDATED) | allainews.com

Aug. 9, 2022, 1:13 a.m. | Kashu Yamazaki, Sang Truong, Khoa Vo, Michael Kidd, Chase Rainwater, Khoa Luu, Ngan Le

cs.CV updates on arXiv.org arxiv.org

In this paper, we leverage the human perceiving process, that involves vision
and language interaction, to generate a coherent paragraph description of
untrimmed videos. We propose vision-language (VL) features consisting of two
modalities, i.e., (i) vision modality to capture global visual content of the
entire scene and (ii) language modality to extract scene elements description
of both human and non-human objects (e.g. animals, vehicles, etc), visual and
non-visual elements (e.g. relations, activities, etc). Furthermore, we propose
to train our proposed …

arxiv captioning cv language learning video vision

More from arxiv.org / cs.CV updates on arXiv.org

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM 22 hours ago | arxiv.org

arxiv benchmark cs.cv eess.iv +5

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI 22 hours ago | arxiv.org

arxiv brain cs.cv eess.iv +4

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation 22 hours ago | arxiv.org

arxiv box creative cs.ai +10

Spiking Structured State Space Model for Monaural Speech Enhancement 22 hours ago | arxiv.org

abstract arxiv challenges computational +17

Improved cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement 22 hours ago | arxiv.org

abstract arxiv challenges classification +18

Multilevel Geometric Optimization for Regularised Constrained Linear Inverse Problems 22 hours ago | arxiv.org

abstract arxiv box compute +7

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models 22 hours ago | arxiv.org

abstract arxiv capability consistent +18

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving 22 hours ago | arxiv.org

arxiv autonomous autonomous driving cs.cv +4

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces 22 hours ago | arxiv.org

abstract arxiv cs.cr cs.cv +10

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

IT Data Engineer

@ Procter & Gamble | BUCHAREST OFFICE

View on ai-jobs.net

Data Engineer (w/m/d)

@ IONOS | Deutschland - Remote

View on ai-jobs.net

Staff Data Science Engineer, SMAI

@ Micron Technology | Hyderabad - Phoenix Aquila, India

View on ai-jobs.net

Academically & Intellectually Gifted Teacher (AIG - Elementary)

@ Wake County Public School System | Cary, NC, United States

View on ai-jobs.net