all AI news
T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5. (arXiv:2211.00586v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
In Spoken language understanding (SLU), a natural solution is concatenating
pre-trained speech models (e.g. HuBERT) and pretrained language models (PLM,
e.g. T5). Most previous works use pretrained language models with subword-based
tokenization. However, the granularity of input units affects the alignment of
speech model outputs and language model inputs, and PLM with character-based
tokenization is underexplored. In this work, we conduct extensive studies on
how PLMs with different tokenization strategies affect spoken language
understanding task including spoken question answering (SQA) …
arxiv language speech spoken language understanding text understanding