all AI news
Word Sense Induction with Knowledge Distillation from BERT. (arXiv:2304.10642v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
Pre-trained contextual language models are ubiquitously employed for language
understanding tasks, but are unsuitable for resource-constrained systems.
Noncontextual word embeddings are an efficient alternative in these settings.
Such methods typically use one vector to encode multiple different meanings of
a word, and incur errors due to polysemy. This paper proposes a two-stage
method to distill multiple word senses from a pre-trained language model (BERT)
by using attention over the senses of a word in a context and transferring this
sense …
arxiv attention bert context distillation embeddings encode errors framework information knowledge language language model language models language understanding multiple paper sense stage systems understanding vector word word embeddings