June 6, 2024, 4:43 a.m. | Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.02649v1 Announce Type: cross
Abstract: Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model that leverages the Whisper encoder representation to dynamically generate prompts for guiding the decoder during the transcription process. We introduce two approaches …

