July 8, 2022, 1:12 a.m. | Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe

cs.CV updates on arXiv.org arxiv.org

In challenging real-life conditions such as extreme head-pose, occlusions,
and low-resolution images where the visual information fails to estimate visual
attention/gaze direction, audio signals could provide important and
complementary information. In this paper, we explore if audio-guided coarse
head-pose can further enhance visual attention estimation performance for
non-prolific faces. Since it is difficult to annotate audio signals for
estimating the head-pose of the speaker, we use off-the-shelf state-of-the-art
models to facilitate cross-modal weak-supervision. During the training phase,
the framework learns …

arxiv attention audio av cv study visual attention

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote