Feb. 21, 2024, 5:48 a.m. | Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.12654v1 Announce Type: new
Abstract: There has been an increasing interest in large speech models that can perform multiple speech processing tasks in a single model. Such models usually adopt the encoder-decoder or decoder-only architecture due to their popularity and good performance in many domains. However, autoregressive models can be slower during inference compared to non-autoregressive models and also have potential risks of hallucination. Though prior studies observed promising results of non-autoregressive models for certain tasks at small scales, it …

abstract architecture arxiv cs.cl cs.sd decoder domains eess.as encoder encoder-decoder foundation foundation model good identification language multiple performance processing recognition speech speech processing speech recognition tasks translation type

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote