Web: http://arxiv.org/abs/2201.10716

Jan. 27, 2022, 2:10 a.m. | Lu Dong, Zhi-Qiang Guo, Chao-Hong Tan, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling

cs.CL updates on arXiv.org arxiv.org

Neural network models have achieved state-of-the-art performance on
grapheme-to-phoneme (G2P) conversion. However, their performance relies on
large-scale pronunciation dictionaries, which may not be available for a lot of
languages. Inspired by the success of the pre-trained language model BERT, this
paper proposes a pre-trained grapheme model called grapheme BERT (GBERT), which
is built by self-supervised training on a large, language-specific word list
with only grapheme information. Furthermore, two approaches are developed to
incorporate GBERT into the state-of-the-art Transformer-based G2P model, …

arxiv conversion models neural

More from arxiv.org / cs.CL updates on arXiv.org

Senior Data Analyst

@ Fanatics Inc | Remote - New York

Data Engineer - Search

@ Cytora | United Kingdom - Remote

Product Manager, Technical - Data Infrastructure and Streaming

@ Nubank | Berlin

Postdoctoral Fellow: ML for autonomous materials discovery

@ Lawrence Berkeley National Lab | Berkeley, CA

Principal Data Scientist

@ Zuora | Remote

Data Engineer

@ Veeva Systems | Pennsylvania - Fort Washington