June 7, 2022, 1:12 a.m. | Juuso Eronen, Michal Ptaszynski, Fumito Masui

cs.CL updates on arXiv.org arxiv.org

In most cases, word embeddings are learned only from raw tokens or in some
cases, lemmas. This includes pre-trained language models like BERT. To
investigate on the potential of capturing deeper relations between lexical
items and structures and to filter out redundant information, we propose to
preserve the morphological, syntactic and other types of linguistic information
by combining them with the raw tokens or lemmas. This means, for example,
including parts-of-speech or dependency information within the used lexical
features. The …

arxiv cyberbullying detection performance word embeddings

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Tableau/PowerBI Developer (A.Con)

@ KPMG India | Bengaluru, Karnataka, India

Software Engineer, Backend - Data Platform (Big Data Infra)

@ Benchling | San Francisco, CA