all AI news
Motion Guided Token Compression for Efficient Masked Video Modeling
March 1, 2024, 5:46 a.m. | Yukun Feng, Yangming Shi, Fengze Liu, Tan Yan
cs.CV updates on arXiv.org arxiv.org
Abstract: Recent developments in Transformers have achieved notable strides in enhancing video comprehension. Nonetheless, the O($N^2$) computation complexity associated with attention mechanisms presents substantial computational hurdles when dealing with the high dimensionality of videos. This challenge becomes particularly pronounced when striving to increase the frames per second (FPS) to enhance the motion capturing capabilities. Such a pursuit is likely to introduce redundancy and exacerbate the existing computational limitations. In this paper, we initiate by showcasing the …
abstract arxiv attention attention mechanisms challenge complexity compression computation computational cs.ai cs.cv dimensionality modeling per token transformers type video videos
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AI Engineering Manager
@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain