all AI news
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization. (arXiv:2208.13085v3 [eess.AS] UPDATED)
Sept. 27, 2022, 1:14 a.m. | Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu
cs.CL updates on arXiv.org arxiv.org
This paper describes a speaker diarization model based on target speaker
voice activity detection (TS-VAD) using transformers. To overcome the original
TS-VAD model's drawback of being unable to handle an arbitrary number of
speakers, we investigate model architectures that use input tensors with
variable-length time and speaker dimensions. Transformer layers are applied to
the speaker axis to make the model output insensitive to the order of the
speaker profiles provided to the TS-VAD model. Time-wise sequential layers are
interspersed between …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Data Engineer
@ Paxos | Remote - United States
Data Analytics Specialist
@ Media.Monks | Kuala Lumpur
Software Engineer III- Pyspark
@ JPMorgan Chase & Co. | India
Engineering Manager, Data Infrastructure
@ Dropbox | Remote - Canada
Senior AI NLP Engineer
@ Hyro | Tel Aviv-Yafo, Tel Aviv District, Israel