all AI news
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling. (arXiv:2211.12781v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
Existing research generally treats Chinese character as a minimum unit for
representation. However, such Chinese character representation will suffer two
bottlenecks: 1) Learning bottleneck, the learning cannot benefit from its rich
internal features (e.g., radicals and strokes); and 2) Parameter bottleneck,
each individual character has to be represented by a unique vector. In this
paper, we introduce a novel representation method for Chinese characters to
break the bottlenecks, namely StrokeNet, which represents a Chinese character
by a Latinized stroke sequence …
arxiv breaking chinese machine machine translation modeling neural machine translation representation stroke translation