Aug. 11, 2023, 6:52 a.m. | Kyle Min

cs.CV updates on arXiv.org arxiv.org

This report introduces our novel method named STHG for the Audio-Visual
Diarization task of the Ego4D Challenge 2023. Our key innovation is that we
model all the speakers in a video using a single, unified heterogeneous graph
learning framework. Unlike previous approaches that require a separate
component solely for the camera wearer, STHG can jointly detect the speech
activities of all people including the camera wearer. Our final method obtains
61.1% DER on the test set of Ego4D, which significantly …

advanced arxiv audio challenge diarization framework graph graph learning innovation novel report speakers temporal video

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York