Sept. 27, 2022, 1:12 a.m. | Rahul Sharma, Shrikanth Narayanan

cs.CV updates on arXiv.org arxiv.org

We present a cross-modal unsupervised framework for active speaker detection
in media content such as TV shows and movies. Machine learning advances have
enabled impressive performance in identifying individuals from speech and
facial images. We leverage speaker identity information from speech and faces,
and formulate active speaker detection as a speech-face assignment task such
that the active speaker's face and the underlying speech identify the same
person (character). We express the speech segments in terms of their associated
speaker identity …

arxiv detection information media unsupervised

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Lead Software Engineer - Artificial Intelligence, LLM

@ OpenText | Hyderabad, TG, IN

Lead Software Engineer- Python Data Engineer

@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom

Data Analyst (m/w/d)

@ Collaboration Betters The World | Berlin, Germany

Data Engineer, Quality Assurance

@ Informa Group Plc. | Boulder, CO, United States

Director, Data Science - Marketing

@ Dropbox | Remote - Canada