all AI news
End-to-end Generative Pre-training for Multimodal Video Captioning
June 7, 2022, 5:24 p.m. | Google AI (noreply@blogger.com)
Google AI Blog ai.googleblog.com
Multimodal video captioning systems utilize both the video frames and speech to generate natural language descriptions (captions) of videos. Such systems are stepping stones towards the longstanding goal of building multimodal conversational systems that effortlessly communicate with users while perceiving environments through multimodal input streams.
Unlike video understanding tasks (e.g., video classification and retrieval) where the key challenge lies in processing and understanding multimodal input …
captioning cvpr multimodal multimodal learning pre-training training video
More from ai.googleblog.com / Google AI Blog
Generative AI to quantify uncertainty in weather forecasting
2 weeks, 6 days ago |
ai.googleblog.com
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Architect
@ Pathward | Remote
Diagnostic Imaging Information Systems (DIIS) Technologist
@ Nova Scotia Health Authority | Halifax, NS, CA, B3K 6R8
Intern Data Scientist - Residual Value Risk Management (f/m/d)
@ BMW Group | Munich, DE
Analytics Engineering Manager
@ PlayStation Global | United Kingdom, London
Junior Insight Analyst (PR&Comms)
@ Signal AI | Lisbon, Lisbon, Portugal