Web: http://arxiv.org/abs/2206.06346

June 16, 2022, 1:13 a.m. | Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

cs.CV updates on arXiv.org arxiv.org

Recent action recognition models have achieved impressive results by
integrating objects, their locations and interactions. However, obtaining dense
structured annotations for each frame is tedious and time-consuming, making
these methods expensive to train and less scalable. At the same time, if a
small set of annotated images is available, either within or outside the domain
of interest, how could we leverage these for a video downstream task? We
propose a learning framework StructureViT (SViT for short), which demonstrates
how utilizing …

arxiv clip cv image tokens video

More from arxiv.org / cs.CV updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY