July 14, 2022, 1:12 a.m. | Wenbiao Li, Rui Sun, Yunfang Wu

cs.CL updates on arXiv.org arxiv.org

Most of the Chinese pre-trained models adopt characters as basic units for
downstream tasks. However, these models ignore the information carried by words
and thus lead to the loss of some important semantics. In this paper, we
propose a new method to exploit word structure and integrate lexical semantics
into character representations of pre-trained models. Specifically, we project
a word's embedding into its internal characters' embeddings according to the
similarity weight. To strengthen the word boundary information, we mix the …

arxiv pre-trained models semantics

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (CPS-GfK)

@ GfK | Bucharest

Consultant Data Analytics IT Digital Impulse - H/F

@ Talan | Paris, France

Data Analyst

@ Experian | Mumbai, India

Data Scientist

@ Novo Nordisk | Princeton, NJ, US

Data Architect IV

@ Millennium Corporation | United States