Jan. 21, 2022, 2:10 a.m. | Chao-Yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

cs.CV updates on arXiv.org arxiv.org

While today's video recognition systems parse snapshots or short clips
accurately, they cannot connect the dots and reason across a longer range of
time yet. Most existing video architectures can only process <5 seconds of a
video without hitting the computation or memory bottlenecks.


In this paper, we propose a new strategy to overcome this challenge. Instead
of trying to process more frames at once like most existing methods, we propose
to process videos in an online fashion and cache …

arxiv cv transformer video vision

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Program Control Data Analyst

@ Ford Motor Company | Mexico

Vice President, Business Intelligence / Data & Analytics

@ AlphaSense | Remote - United States