all AI news
Omnivore: A Single Model for Many Visual Modalities. (arXiv:2201.08377v1 [cs.CV])
Jan. 21, 2022, 2:11 a.m. | Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra
cs.LG updates on arXiv.org arxiv.org
Prior work has studied different visual modalities in isolation and developed
separate architectures for recognition of images, videos, and 3D data. Instead,
in this paper, we propose a single model which excels at classifying images,
videos, and single-view 3D data using exactly the same model parameters. Our
'Omnivore' model leverages the flexibility of transformer-based architectures
and is trained jointly on classification tasks from different modalities.
Omnivore is simple to train, uses off-the-shelf standard datasets, and performs
at-par or better than …
More from arxiv.org / cs.LG updates on arXiv.org
Regularization by Texts for Latent Diffusion Inverse Solvers
1 day, 23 hours ago |
arxiv.org
When can transformers reason with abstract symbols?
1 day, 23 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Architect
@ Pathward | Remote
Diagnostic Imaging Information Systems (DIIS) Technologist
@ Nova Scotia Health Authority | Halifax, NS, CA, B3K 6R8
Intern Data Scientist - Residual Value Risk Management (f/m/d)
@ BMW Group | Munich, DE
Analytics Engineering Manager
@ PlayStation Global | United Kingdom, London
Junior Insight Analyst (PR&Comms)
@ Signal AI | Lisbon, Lisbon, Portugal