Web: https://www.reddit.com/r/machinelearningnews/comments/vex1xw/google_ai_introduces_mvgpt_a_new_generative/

June 18, 2022, 3:13 a.m. | /u/No_Coffee_4638

machinelearningnews reddit.com

🚦 Multimodal video generative pre-training or MV-GPT, jointly trains a multimodal video encoder and a sentence decoder from unlabelled videos by leveraging a future utterance as the target text and formulating a novel bi-directional generation task.

🚦 It achieves state-of-the-art performance for multimodal video captioning on four standard benchmarks and for other video understanding tasks such as VideoQA, video retrieval and action classification.

[Continue reading](https://www.marktechpost.com/2022/06/17/google-ai-introduces-mv-gpt-a-new-generative-pre-training-framework-for-multimodal-video-captioning/) | *Checkout the* [*paper*](https://arxiv.org/pdf/2201.08264.pdf) *and* [*post*](https://ai.googleblog.com/2022/06/end-to-end-generative-pre-training-for.html)

