Nov. 24, 2022, 7:18 a.m. | Zhijun Wang, Xuebo Liu, Min Zhang

cs.CL updates on arXiv.org arxiv.org

Existing research generally treats Chinese character as a minimum unit for
representation. However, such Chinese character representation will suffer two
bottlenecks: 1) Learning bottleneck, the learning cannot benefit from its rich
internal features (e.g., radicals and strokes); and 2) Parameter bottleneck,
each individual character has to be represented by a unique vector. In this
paper, we introduce a novel representation method for Chinese characters to
break the bottlenecks, namely StrokeNet, which represents a Chinese character
by a Latinized stroke sequence …

arxiv breaking chinese machine machine translation modeling neural machine translation representation stroke translation

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore