all AI news
Transfer-learning for video classification: Video Swin Transformer on multiple domains. (arXiv:2210.09969v1 [cs.CV])
Oct. 19, 2022, 1:16 a.m. | Daniel Oliveira, David Martins de Matos
cs.CV updates on arXiv.org arxiv.org
The computer vision community has seen a shift from convolutional-based to
pure transformer architectures for both image and video tasks. Training a
transformer from zero for these tasks usually requires a lot of data and
computational resources. Video Swin Transformer (VST) is a pure-transformer
model developed for video classification which achieves state-of-the-art
results in accuracy and efficiency on several datasets. In this paper, we aim
to understand if VST generalizes well enough to be used in an out-of-domain
setting. We …
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
1 day, 15 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Intern Large Language Models Planning (f/m/x)
@ BMW Group | Munich, DE
Data Engineer Analytics
@ Meta | Menlo Park, CA | Remote, US