Aug. 22, 2022, 1:14 a.m. | Zohreh Ghaderi, Leonard Salewski, Hendrik P. A. Lensch

cs.CV updates on arXiv.org arxiv.org

To generate proper captions for videos, the inference needs to identify
relevant concepts and pay attention to the spatial relationships between them
as well as to the temporal development in the clip. Our end-to-end
encoder-decoder video captioning framework incorporates two transformer-based
architectures, an adapted transformer for a single joint spatio-temporal video
analysis as well as a self-attention-based decoder for advanced text
generation. Furthermore, we introduce an adaptive frame selection scheme to
reduce the number of required incoming frames while maintaining …

arxiv attention captioning cv temporal video

Data Engineer

@ Bosch Group | San Luis Potosí, Mexico

DATA Engineer (H/F)

@ Renault Group | FR REN RSAS - Le Plessis-Robinson (Siège)

Advisor, Data engineering

@ Desjardins | 1, Complexe Desjardins, Montréal

Data Engineer Intern

@ Getinge | Wayne, NJ, US

Software Engineer III- Java / Python / Pyspark / ETL

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

Lead Data Engineer (Azure/AWS)

@ Telstra | Telstra ICC Bengaluru