June 11, 2024, 4:50 a.m. | Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.06187v1 Announce Type: new
Abstract: Unlike the sparse label action detection task, where a single action occurs in each timestamp of a video, in a dense multi-label scenario, actions can overlap. To address this challenging task, it is necessary to simultaneously learn (i) temporal dependencies and (ii) co-occurrence action relationships. Recent approaches model temporal information by extracting multi-scale features through hierarchical transformer-based networks. However, the self-attention mechanism in transformers inherently loses temporal positional information. We argue that combining this with …

abstract action arxiv cs.cv dependencies detection learn relationships temporal type video

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA