all AI news
[R] Video Question Answering with Iterative Video-Text Co-Tokenization - Google 2022 - Reduces GFLOPs from 150-360 to only 67 while being able to achive new SOTAs in three main VideoQA benchmarks MSRVTT-QA, MSVD-QA and IVQA!
Aug. 11, 2022, 5:52 p.m. | /u/Singularian2501
Machine Learning www.reddit.com
[https://ai.googleblog.com/2022/08/efficient-video-text-learning-with.html](https://ai.googleblog.com/2022/08/efficient-video-text-learning-with.html)
Abstract:
>Video question answering is a challenging task that requires understanding jointly the language input, the visual information in individual video frames, as well as the temporal information about the events occurring in the video. In this paper, we propose a novel **multi-stream video encoder for video question answering that uses multiple video inputs and a new video-text iterative co-tokenization approach to answer a variety of questions related to videos**. We experimentally evaluate the model on several …
benchmarks google iterative machinelearning qa question answering text tokenization video
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Senior Applied Data Scientist
@ dunnhumby | London
Principal Data Architect - Azure & Big Data
@ MGM Resorts International | Home Office - US, NV