April 30, 2024, 4:51 a.m. | Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

cs.CL updates on arXiv.org arxiv.org

arXiv:2310.14278v2 Announce Type: replace-cross
Abstract: Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the Conformer encoder-decoder model with cross-modal conversational representation. Our approach leverages a cross-modal extractor that combines pre-trained speech and text models through a specialized encoder and …

abstract arxiv asr audio automatic speech recognition challenges conversational cs.cl cs.sd eess.as error extract information issue modal novel propagation recognition redundancy representation speech speech recognition struggle textual type unique

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York