all AI news
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning. (arXiv:2206.12972v2 [cs.CV] UPDATED)
Aug. 9, 2022, 1:13 a.m. | Kashu Yamazaki, Sang Truong, Khoa Vo, Michael Kidd, Chase Rainwater, Khoa Luu, Ngan Le
cs.CV updates on arXiv.org arxiv.org
In this paper, we leverage the human perceiving process, that involves vision
and language interaction, to generate a coherent paragraph description of
untrimmed videos. We propose vision-language (VL) features consisting of two
modalities, i.e., (i) vision modality to capture global visual content of the
entire scene and (ii) language modality to extract scene elements description
of both human and non-human objects (e.g. animals, vehicles, etc), visual and
non-visual elements (e.g. relations, activities, etc). Furthermore, we propose
to train our proposed …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
IT Data Engineer
@ Procter & Gamble | BUCHAREST OFFICE
Data Engineer (w/m/d)
@ IONOS | Deutschland - Remote
Staff Data Science Engineer, SMAI
@ Micron Technology | Hyderabad - Phoenix Aquila, India
Academically & Intellectually Gifted Teacher (AIG - Elementary)
@ Wake County Public School System | Cary, NC, United States