all AI news
Unsupervised active speaker detection in media content using cross-modal information. (arXiv:2209.11896v1 [eess.IV])
Sept. 27, 2022, 1:12 a.m. | Rahul Sharma, Shrikanth Narayanan
cs.CV updates on arXiv.org arxiv.org
We present a cross-modal unsupervised framework for active speaker detection
in media content such as TV shows and movies. Machine learning advances have
enabled impressive performance in identifying individuals from speech and
facial images. We leverage speaker identity information from speech and faces,
and formulate active speaker detection as a speech-face assignment task such
that the active speaker's face and the underlying speech identify the same
person (character). We express the speech segments in terms of their associated
speaker identity …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Lead Software Engineer - Artificial Intelligence, LLM
@ OpenText | Hyderabad, TG, IN
Lead Software Engineer- Python Data Engineer
@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom
Data Analyst (m/w/d)
@ Collaboration Betters The World | Berlin, Germany
Data Engineer, Quality Assurance
@ Informa Group Plc. | Boulder, CO, United States
Director, Data Science - Marketing
@ Dropbox | Remote - Canada