Web: http://arxiv.org/abs/2206.11895

June 24, 2022, 1:12 a.m. | Jinghuan Shang, Srijan Das, Michael S. Ryoo

cs.CV updates on arXiv.org arxiv.org

Humans are remarkably flexible in understanding viewpoint changes due to
visual cortex supporting the perception of 3D structure. In contrast, most of
the computer vision models that learn visual representation from a pool of 2D
images often fail to generalize over novel camera viewpoints. Recently, the
vision architectures have shifted towards convolution-free architectures,
visual Transformers, which operate on tokens derived from image patches.
However, neither these Transformers nor 2D convolutional networks perform
explicit operations to learn viewpoint-agnostic representation for visual …

3d arxiv cv learning space tokens

More from arxiv.org / cs.CV updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY