April 30, 2024, 4:48 a.m. | Quoc-Huy Tran, Muhammad Ahmed, Murad Popattia, M. Hassan Ahmed, Andrey Konin, M. Zeeshan Zia

cs.CV updates on arXiv.org arxiv.org

arXiv:2305.19480v5 Announce Type: replace
Abstract: This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications. In contrast with the state-of-the-art method of CASA, where sequences of 3D skeleton coordinates are taken directly as input, our key idea is to use sequences of 2D skeleton heatmaps as input. Unlike CASA which performs self-attention in the temporal domain only, we feed 2D skeleton heatmaps to a video transformer which performs self-attention both in …

abstract alignment applications art arxiv contrast cs.cv fine-grained framework fusion human key paper state temporal type understanding video

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US