July 21, 2022, 1:12 a.m. | Nikolaos Gkalelis, Dimitrios Daskalakis, Vasileios Mezaris

cs.CV updates on arXiv.org arxiv.org

In this paper a pure-attention bottom-up approach, called ViGAT, that
utilizes an object detector together with a Vision Transformer (ViT) backbone
network to derive object and frame features, and a head network to process
these features for the task of event recognition and explanation in video, is
proposed. The ViGAT head consists of graph attention network (GAT) blocks
factorized along the spatial and temporal dimensions in order to capture
effectively both local and long-term dependencies between objects or frames.
Moreover, …

arxiv attention cv event graph network video

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

.NET Software Engineer (AI Focus)

@ Boskalis | Papendrecht, Netherlands