June 21, 2024, 4:41 a.m. | Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang

cs.CL updates on arXiv.org arxiv.org

arXiv:2406.13114v1 Announce Type: new
Abstract: Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization …

abstract advanced arxiv capabilities challenges compact cs.ai cs.cl deploying distillation enabling knowledge language language models language processing large language large language models llms natural natural language natural language processing processing solution stage tasks them transfer type

Software Engineer II –Decision Intelligence Delivery and Support

@ Bristol Myers Squibb | Hyderabad

Senior Data Governance Consultant (Remote in US)

@ Resultant | Indianapolis, IN, United States

Power BI Developer

@ Brompton Bicycle | Greenford, England, United Kingdom

VP, Enterprise Applications

@ Blue Yonder | Scottsdale

Data Scientist - Moloco Commerce Media

@ Moloco | Redwood City, California, United States

Senior Backend Engineer (New York)

@ Kalepa | New York City. Hybrid