all AI news
Sparse video tubes for joint video and image vision transformers
Google AI Blog ai.googleblog.com
Video understanding is a challenging problem that requires reasoning about both spatial information (e.g., for objects in a scene, including their locations and relations) and temporal information for activities or events shown in a video. There are many video understanding applications and tasks, such as understanding the semantic content of web videos and robot perception. However, current works, such as ViViT and TimeSFormer, densely process the video and …
computer vision cvpr events google image information objects reasoning relations research scientists temporal transformers understanding video video analysis video understanding vision