Sinkhorn Distance Minimization for Knowledge Distillation | allainews.com

Feb. 28, 2024, 5:41 a.m. | Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.17110v1 Announce Type: new
Abstract: Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) divergences. However, due to limitations inherent in their assumptions and definitions, these measures fail to deliver effective supervision when few distribution overlap exists between the teacher and the student. In this paper, we show that the aforementioned KL, RKL, and JS divergences respectively suffer from …

abstract arxiv assumptions cs.lg definitions distillation divergence knowledge language language models large language large language models limitations llms supervision type

More from arxiv.org / cs.LG updates on arXiv.org

Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous … 1 day, 15 hours ago | arxiv.org

abstract adversarial adversarial attacks analysis +30

Learning Explainable and Better Performing Representations of POMDP Strategies 1 day, 15 hours ago | arxiv.org

abstract algorithm arxiv automaton +16

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers 1 day, 15 hours ago | arxiv.org

abstract applications art arxiv +20

Testable Learning with Distribution Shift 1 day, 15 hours ago | arxiv.org

abstract arxiv classifier cs.ds +13

Quantum circuit synthesis with diffusion models 1 day, 15 hours ago | arxiv.org

abstract advantages arxiv computing +20

EnCodecMAE: Leveraging neural codecs for universal audio representation learning 1 day, 15 hours ago | arxiv.org

abstract arxiv audio bert +20

Fitness Approximation through Machine Learning 1 day, 15 hours ago | arxiv.org

abstract algorithms approximation arxiv +14

High-Resolution Cranial Defect Reconstruction by Iterative, Low-Resolution, Point Cloud Completion Transformers 1 day, 15 hours ago | arxiv.org

abstract arxiv availability cloud +16

Empirical Sample Complexity of Neural Network Mixed State Reconstruction 1 day, 15 hours ago | arxiv.org

abstract applications arxiv case +17

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Stage - Product Owner Assistant - Data Platform / Business Intelligence (M/F)

@ Pernod Ricard | FR - Paris - The Island

View on ai-jobs.net