all AI news
Diverse Video Captioning by Adaptive Spatio-temporal Attention. (arXiv:2208.09266v1 [cs.CV])
Aug. 22, 2022, 1:14 a.m. | Zohreh Ghaderi, Leonard Salewski, Hendrik P. A. Lensch
cs.CV updates on arXiv.org arxiv.org
To generate proper captions for videos, the inference needs to identify
relevant concepts and pay attention to the spatial relationships between them
as well as to the temporal development in the clip. Our end-to-end
encoder-decoder video captioning framework incorporates two transformer-based
architectures, an adapted transformer for a single joint spatio-temporal video
analysis as well as a self-attention-based decoder for advanced text
generation. Furthermore, we introduce an adaptive frame selection scheme to
reduce the number of required incoming frames while maintaining …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Engineer
@ Bosch Group | San Luis Potosí, Mexico
DATA Engineer (H/F)
@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)
Advisor, Data engineering
@ Desjardins | 1, Complexe Desjardins, Montréal
Data Engineer Intern
@ Getinge | Wayne, NJ, US
Software Engineer III- Java / Python / Pyspark / ETL
@ JPMorgan Chase & Co. | Jersey City, NJ, United States
Lead Data Engineer (Azure/AWS)
@ Telstra | Telstra ICC Bengaluru