all AI news
MaskViT: Masked Visual Pre-Training for Video Prediction. (arXiv:2206.11894v1 [cs.CV])
June 24, 2022, 1:12 a.m. | Agrim Gupta, Stephen Tian, Yunzhi Zhang, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei
cs.CV updates on arXiv.org arxiv.org
The ability to predict future visual observations conditioned on past
observations and motor commands can enable embodied agents to plan solutions to
a variety of tasks in complex environments. This work shows that we can create
good video prediction models by pre-training transformers via masked visual
modeling. Our approach, named MaskViT, is based on two simple design decisions.
First, for memory and training efficiency, we use two types of window
attention: spatial and spatiotemporal. Second, during training, we mask a …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Quality, Senior Analyst
@ Toyota North America | Plano
Data Analyst & Audit Management Software (AMS) Coordinator
@ World Vision | Philippines - Home Working
Product Manager Power BI Platform Tech I&E Operational Insights
@ ING | HBP (Amsterdam - Haarlerbergpark)
Sr. Director, Software Engineering, Clinical Data Strategy
@ Moderna | USA-Washington-Seattle-1099 Stewart Street
Data Engineer (Data as a Service)
@ Xplor | Atlanta, GA, United States