Jan. 21, 2022, 2:11 a.m. | Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra

cs.LG updates on arXiv.org arxiv.org

Prior work has studied different visual modalities in isolation and developed
separate architectures for recognition of images, videos, and 3D data. Instead,
in this paper, we propose a single model which excels at classifying images,
videos, and single-view 3D data using exactly the same model parameters. Our
'Omnivore' model leverages the flexibility of transformer-based architectures
and is trained jointly on classification tasks from different modalities.
Omnivore is simple to train, uses off-the-shelf standard datasets, and performs
at-par or better than …

arxiv cv

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Enterprise Data Architect

@ Pathward | Remote

Diagnostic Imaging Information Systems (DIIS) Technologist

@ Nova Scotia Health Authority | Halifax, NS, CA, B3K 6R8

Intern Data Scientist - Residual Value Risk Management (f/m/d)

@ BMW Group | Munich, DE

Analytics Engineering Manager

@ PlayStation Global | United Kingdom, London

Junior Insight Analyst (PR&Comms)

@ Signal AI | Lisbon, Lisbon, Portugal