Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling. (arXiv:2211.12781v1 [cs.CL]) | allainews.com

Nov. 24, 2022, 7:18 a.m. | Zhijun Wang, Xuebo Liu, Min Zhang

cs.CL updates on arXiv.org arxiv.org

Existing research generally treats Chinese character as a minimum unit for
representation. However, such Chinese character representation will suffer two
bottlenecks: 1) Learning bottleneck, the learning cannot benefit from its rich
internal features (e.g., radicals and strokes); and 2) Parameter bottleneck,
each individual character has to be represented by a unique vector. In this
paper, we introduce a novel representation method for Chinese characters to
break the bottlenecks, namely StrokeNet, which represents a Chinese character
by a Latinized stroke sequence …

arxiv breaking chinese machine machine translation modeling neural machine translation representation stroke translation

More from arxiv.org / cs.CL updates on arXiv.org

Sparse is Enough in Fine-tuning Pre-trained Large Language Models 30 seconds ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

On the Learnability of Watermarks for Language Models 31 seconds ago | arxiv.org

abstract arxiv cs.cl cs.cr +17

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization 31 seconds ago | arxiv.org

abstract arxiv capabilities cs.ai +14

Evaluating Generative Ad Hoc Information Retrieval 32 seconds ago | arxiv.org

abstract advances arxiv cs.cl +19

Language Models As Semantic Indexers 33 seconds ago | arxiv.org

arxiv cs.cl cs.ir cs.lg +4

Large language models can accurately predict searcher preferences 34 seconds ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

On the Reliability of Watermarks for Large Language Models 35 seconds ago | arxiv.org

abstract arxiv become bots +28

A Watermark for Large Language Models 36 seconds ago | arxiv.org

abstract arxiv cs.cl cs.cr +16

CreoleVal: Multilingual Multitask Benchmarks for Creoles 37 seconds ago | arxiv.org

abstract annotated data arxiv benchmarks +14

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore

View on ai-jobs.net