Feb. 14, 2024, 5:47 a.m. | Xiaogang Peng Hao Wen Yikai Luo Xiao Zhou Keyang Yu Ping Yang Zizhao Wu

cs.CV updates on arXiv.org arxiv.org

In recent years, the task of weakly supervised audio-visual violence detection has gained considerable attention. The goal of this task is to identify violent segments within multimodal data based on video-level labels. Despite advances in this field, traditional Euclidean neural networks, which have been used in prior research, encounter difficulties in capturing highly discriminative representations due to limitations of the feature space. To overcome this, we propose HyperVD, a novel framework that learns snippet embeddings in hyperbolic space to improve …

advances attention audio cs.cv data detection identify labels multimodal multimodal data networks neural networks prior research space video violence visual

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York