Web: http://arxiv.org/abs/2110.02432

Sept. 20, 2022, 1:14 a.m. | Rishabh Bhardwaj, Tushar Vaidya, Soujanya Poria

cs.CL updates on arXiv.org arxiv.org

We propose a new approach, Knowledge Distillation using Optimal Transport
(KNOT), to distill the natural language semantic knowledge from multiple
teacher networks to a student network. KNOT aims to train a (global) student
model by learning to minimize the optimal transport cost of its assigned
probability distribution over the labels to the weighted sum of probabilities
predicted by the (local) teacher models, under the constraints, that the
student model does not have access to teacher models' parameters or training
data. …

arxiv distillation knowledge nlp transport

