Nov. 2, 2022, 1:15 a.m. | Chan-Jan Hsu, Ho-Lam Chung, Hung-yi Lee, Yu Tsao

cs.CL updates on arXiv.org arxiv.org

In Spoken language understanding (SLU), a natural solution is concatenating
pre-trained speech models (e.g. HuBERT) and pretrained language models (PLM,
e.g. T5). Most previous works use pretrained language models with subword-based
tokenization. However, the granularity of input units affects the alignment of
speech model outputs and language model inputs, and PLM with character-based
tokenization is underexplored. In this work, we conduct extensive studies on
how PLMs with different tokenization strategies affect spoken language
understanding task including spoken question answering (SQA) …

arxiv language speech spoken language understanding text understanding

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US