all AI news
Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models. (arXiv:2207.05928v1 [cs.CL])
July 14, 2022, 1:12 a.m. | Wenbiao Li, Rui Sun, Yunfang Wu
cs.CL updates on arXiv.org arxiv.org
Most of the Chinese pre-trained models adopt characters as basic units for
downstream tasks. However, these models ignore the information carried by words
and thus lead to the loss of some important semantics. In this paper, we
propose a new method to exploit word structure and integrate lexical semantics
into character representations of pre-trained models. Specifically, we project
a word's embedding into its internal characters' embeddings according to the
similarity weight. To strengthen the word boundary information, we mix the …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst (CPS-GfK)
@ GfK | Bucharest
Consultant Data Analytics IT Digital Impulse - H/F
@ Talan | Paris, France
Data Analyst
@ Experian | Mumbai, India
Data Scientist
@ Novo Nordisk | Princeton, NJ, US
Data Architect IV
@ Millennium Corporation | United States