all AI news
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation
March 22, 2024, 4:46 a.m. | Samuel Pegg, Kai Li, Xiaolin Hu
cs.CV updates on arXiv.org arxiv.org
Abstract: Audio-visual speech separation methods aim to integrate different modalities to generate high-quality separated speech, thereby enhancing the performance of downstream tasks such as speech recognition. Most existing state-of-the-art (SOTA) models operate in the time domain. However, their overly simplistic approach to modeling acoustic features often necessitates larger and more computationally intensive models in order to achieve SOTA performance. In this paper, we present a novel time-frequency domain audio-visual speech separation method: Recurrent Time-Frequency Separation Network …
abstract aim art arxiv audio cs.cv cs.sd domain eess.as features generate however modeling modelling performance quality recognition sota speech speech recognition state tasks type visual
More from arxiv.org / cs.CV updates on arXiv.org
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
1 day, 11 hours ago |
arxiv.org
Fingerprint Matching with Localized Deep Representation
1 day, 11 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne