April 24, 2023, 12:48 a.m. | Anik Saha, Alex Gittens, Bulent Yener

cs.CL updates on arXiv.org arxiv.org

Pre-trained contextual language models are ubiquitously employed for language
understanding tasks, but are unsuitable for resource-constrained systems.
Noncontextual word embeddings are an efficient alternative in these settings.
Such methods typically use one vector to encode multiple different meanings of
a word, and incur errors due to polysemy. This paper proposes a two-stage
method to distill multiple word senses from a pre-trained language model (BERT)
by using attention over the senses of a word in a context and transferring this
sense …

arxiv attention bert context distillation embeddings encode errors framework information knowledge language language model language models language understanding multiple paper sense stage systems understanding vector word word embeddings

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South