Sept. 27, 2022, 1:14 a.m. | Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

cs.CL updates on arXiv.org arxiv.org

This paper describes a speaker diarization model based on target speaker
voice activity detection (TS-VAD) using transformers. To overcome the original
TS-VAD model's drawback of being unable to handle an arbitrary number of
speakers, we investigate model architectures that use input tensors with
variable-length time and speaker dimensions. Transformer layers are applied to
the speaker axis to make the model output insensitive to the order of the
speaker profiles provided to the TS-VAD model. Time-wise sequential layers are
interspersed between …

arxiv detection integration transformers voice

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Data Engineer

@ Paxos | Remote - United States

Data Analytics Specialist

@ Media.Monks | Kuala Lumpur

Software Engineer III- Pyspark

@ JPMorgan Chase & Co. | India

Engineering Manager, Data Infrastructure

@ Dropbox | Remote - Canada

Senior AI NLP Engineer

@ Hyro | Tel Aviv-Yafo, Tel Aviv District, Israel