End-to-end Generative Pre-training for Multimodal Video Captioning | allainews.com

June 7, 2022, 5:24 p.m. | Google AI (noreply@blogger.com)

Google AI Blog ai.googleblog.com

Posted by Paul Hongsuck Seo and Arsha Nagrani, Research Scientists, Google Research, Perception Team

Multimodal video captioning systems utilize both the video frames and speech to generate natural language descriptions (captions) of videos. Such systems are stepping stones towards the longstanding goal of building multimodal conversational systems that effortlessly communicate with users while perceiving environments through multimodal input streams.

Unlike video understanding tasks (e.g., video classification and retrieval) where the key challenge lies in processing and understanding multimodal input …

captioning cvpr multimodal multimodal learning pre-training training video

More from ai.googleblog.com / Google AI Blog

Generative AI to quantify uncertainty in weather forecasting 2 weeks, 6 days ago | ai.googleblog.com

climate decisions engineer example +17

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks 3 weeks ago | ai.googleblog.com

bayesian data economic engineer +23

Computer-aided diagnosis for lung cancer screening 4 weeks, 1 day ago | ai.googleblog.com

cancer cancer screening computer diagnosis +16

Using AI to expand global access to reliable flood forecasts 4 weeks, 1 day ago | ai.googleblog.com

billion disaster engineering environment +13

ScreenAI: A visual language model for UI and visually-situated language understanding 4 weeks, 2 days ago | ai.googleblog.com

charts communication design diagrams +24

SCIN: A new resource for representative dermatology images 4 weeks, 2 days ago | ai.googleblog.com

crowd-sourcing dataset datasets dermatology +14

MELON: Reconstructing 3D objects from images with unknown poses 1 month ago | ai.googleblog.com

3d objects capacity computer vision engineer +16

HEAL: A framework for health equity assessment of machine learning performance 1 month ago | ai.googleblog.com

assessment clinical core differences +17

Cappy: Outperforming and boosting large multi-task language models with a small scorer 1 month ago | ai.googleblog.com

boosting engineers framework google +25

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

View on ai-jobs.net

Enterprise Data Architect

@ Pathward | Remote

View on ai-jobs.net

Diagnostic Imaging Information Systems (DIIS) Technologist

@ Nova Scotia Health Authority | Halifax, NS, CA, B3K 6R8

View on ai-jobs.net

Intern Data Scientist - Residual Value Risk Management (f/m/d)

@ BMW Group | Munich, DE

View on ai-jobs.net

Analytics Engineering Manager

@ PlayStation Global | United Kingdom, London

View on ai-jobs.net

Junior Insight Analyst (PR&Comms)

@ Signal AI | Lisbon, Lisbon, Portugal

View on ai-jobs.net