all AI news
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
April 24, 2024, 4:45 a.m. | Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang
cs.CV updates on arXiv.org arxiv.org
Abstract: As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. However, one crucial issue is completely ignored: the class descriptions given by users may be noisy, e.g., misspellings and typos, limiting the real-world practicality …
abstract action recognition arxiv attention class computer computer vision cs.cv development embedding fundamental labels language recognition robustness tasks text type video vision vision-language visual
More from arxiv.org / cs.CV updates on arXiv.org
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
1 day, 2 hours ago |
arxiv.org
Fingerprint Matching with Localized Deep Representation
1 day, 2 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne