Web: https://www.reddit.com/r/machinelearningnews/comments/vex1xw/google_ai_introduces_mvgpt_a_new_generative/

June 18, 2022, 3:13 a.m. | /u/No_Coffee_4638

machinelearningnews reddit.com

🚦 Multimodal video generative pre-training or MV-GPT, jointly trains a multimodal video encoder and a sentence decoder from unlabelled videos by leveraging a future utterance as the target text and formulating a novel bi-directional generation task.

🚦 It achieves state-of-the-art performance for multimodal video captioning on four standard benchmarks and for other video understanding tasks such as VideoQA, video retrieval and action classification.

[Continue reading](https://www.marktechpost.com/2022/06/17/google-ai-introduces-mv-gpt-a-new-generative-pre-training-framework-for-multimodal-video-captioning/) | *Checkout the* [*paper*](https://arxiv.org/pdf/2201.08264.pdf) *and* [*post*](https://ai.googleblog.com/2022/06/end-to-end-generative-pre-training-for.html)

ai captioning framework google gpt machinelearningnews multimodal pre-training training video

More from reddit.com / machinelearningnews

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY