all AI news
Cross-Modal Graph with Meta Concepts for Video Captioning. (arXiv:2108.06458v3 [cs.CV] UPDATED)
Aug. 2, 2022, 2:13 a.m. | Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
cs.CV updates on arXiv.org arxiv.org
Video captioning targets interpreting the complex visual contents as text
descriptions, which requires the model to fully understand video scenes
including objects and their interactions. Prevailing methods adopt
off-the-shelf object detection networks to give object proposals and use the
attention mechanism to model the relations between objects. They often miss
some undefined semantic concepts of the pretrained model and fail to identify
exact predicate relationships between objects. In this paper, we investigate an
open research task of generating text descriptions …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Quality, Senior Analyst
@ Toyota North America | Plano
Data Analyst & Audit Management Software (AMS) Coordinator
@ World Vision | Philippines - Home Working
Product Manager Power BI Platform Tech I&E Operational Insights
@ ING | HBP (Amsterdam - Haarlerbergpark)
Sr. Director, Software Engineering, Clinical Data Strategy
@ Moderna | USA-Washington-Seattle-1099 Stewart Street
Data Engineer (Data as a Service)
@ Xplor | Atlanta, GA, United States