all AI news
Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos. (arXiv:2205.05854v1 [cs.CV])
May 13, 2022, 1:10 a.m. | Shuo Yang, Xinxiao Wu
cs.CV updates on arXiv.org arxiv.org
Language-driven action localization in videos is a challenging task that
involves not only visual-linguistic matching but also action boundary
prediction. Recent progress has been achieved through aligning language query
to video segments, but estimating precise boundaries is still under-explored.
In this paper, we propose entity-aware and motion-aware Transformers that
progressively localizes actions in videos by first coarsely locating clips with
entity queries and then finely predicting exact boundaries in a shrunken
temporal region with motion queries. The entity-aware Transformer incorporates …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
(373) Applications Manager – Business Intelligence - BSTD
@ South African Reserve Bank | South Africa
Data Engineer Talend (confirmé/sénior) - H/F - CDI
@ Talan | Paris, France
Data Science Intern (Summer) / Stagiaire en données (été)
@ BetterSleep | Montreal, Quebec, Canada
Director - Master Data Management (REMOTE)
@ Wesco | Pittsburgh, PA, United States
Architect Systems BigData REF2649A
@ Deutsche Telekom IT Solutions | Budapest, Hungary
Data Product Coordinator
@ Nestlé | São Paulo, São Paulo, BR, 04730-000