March 8, 2024, 5:42 a.m. | R. Gnana Praveen, Jahangir Alam

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.04661v1 Announce Type: cross
Abstract: Although person or identity verification has been predominantly explored using individual modalities such as face and voice, audio-visual fusion has recently shown immense potential to outperform unimodal approaches. Audio and visual modalities are often expected to pose strong complementary relationships, which plays a crucial role in effective audio-visual fusion. However, they may not always strongly complement each other, they may also exhibit weak complementary relationships, resulting in poor audio-visual feature representations. In this paper, we …

abstract arxiv attention audio cs.cv cs.lg cs.sd dynamic eess.as face fusion identity identity verification person relationships role type verification visual voice

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Director, Clinical Data Science

@ Aura | Remote USA

Research Scientist, AI (PhD)

@ Meta | Menlo Park, CA | New York City