all AI news
Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation
June 21, 2024, 4:41 a.m. | Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang
cs.CL updates on arXiv.org arxiv.org
Abstract: Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization …
abstract advanced arxiv capabilities challenges compact cs.ai cs.cl deploying distillation enabling knowledge language language models language processing large language large language models llms natural natural language natural language processing processing solution stage tasks them transfer type
More from arxiv.org / cs.CL updates on arXiv.org
ReFT: Reasoning with Reinforced Fine-Tuning
2 days, 5 hours ago |
arxiv.org
Exploring Defeasibility in Causal Reasoning
2 days, 5 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer II –Decision Intelligence Delivery and Support
@ Bristol Myers Squibb | Hyderabad
Senior Data Governance Consultant (Remote in US)
@ Resultant | Indianapolis, IN, United States
Power BI Developer
@ Brompton Bicycle | Greenford, England, United Kingdom
VP, Enterprise Applications
@ Blue Yonder | Scottsdale
Data Scientist - Moloco Commerce Media
@ Moloco | Redwood City, California, United States
Senior Backend Engineer (New York)
@ Kalepa | New York City. Hybrid