Unsupervised active speaker detection in media content using cross-modal information. (arXiv:2209.11896v1 [eess.IV]) | allainews.com

Sept. 27, 2022, 1:12 a.m. | Rahul Sharma, Shrikanth Narayanan

cs.CV updates on arXiv.org arxiv.org

We present a cross-modal unsupervised framework for active speaker detection
in media content such as TV shows and movies. Machine learning advances have
enabled impressive performance in identifying individuals from speech and
facial images. We leverage speaker identity information from speech and faces,
and formulate active speaker detection as a speech-face assignment task such
that the active speaker's face and the underlying speech identify the same
person (character). We express the speech segments in terms of their associated
speaker identity …

arxiv detection information media unsupervised

More from arxiv.org / cs.CV updates on arXiv.org

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception 1 day, 9 hours ago | arxiv.org

agent arxiv autonomous cs.cl +8

Low-resolution Prior Equilibrium Network for CT Reconstruction 1 day, 9 hours ago | arxiv.org

abstract arxiv cs.cv deep learning +17

MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images 1 day, 9 hours ago | arxiv.org

abstract artifact arxiv cs.cv +16

Back to Basics: Fast Denoising Iterative Algorithm 1 day, 9 hours ago | arxiv.org

abstract algorithm arxiv basics +10

Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models 1 day, 9 hours ago | arxiv.org

abstract arxiv benefit clinicians +10

Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models 1 day, 9 hours ago | arxiv.org

abstract adversarial adversarial examples art +20

Methods and strategies for improving the novel view synthesis quality of neural radiation field 1 day, 9 hours ago | arxiv.org

abstract application arxiv attention +16

AffordanceLLM: Grounding Affordance from Vision Language Models 1 day, 9 hours ago | arxiv.org

arxiv cs.cv cs.ro language +3

DualFluidNet: an Attention-based Dual-pipeline Network for FLuid Simulation 1 day, 9 hours ago | arxiv.org

arxiv attention cs.cv cs.gr +4

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Lead Software Engineer - Artificial Intelligence, LLM

@ OpenText | Hyderabad, TG, IN

View on ai-jobs.net

Lead Software Engineer- Python Data Engineer

@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom

View on ai-jobs.net

Data Analyst (m/w/d)

@ Collaboration Betters The World | Berlin, Germany

View on ai-jobs.net

Data Engineer, Quality Assurance

@ Informa Group Plc. | Boulder, CO, United States

View on ai-jobs.net

Director, Data Science - Marketing

@ Dropbox | Remote - Canada

View on ai-jobs.net