all AI news
Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks. (arXiv:2210.04834v2 [cs.CL] UPDATED)
Oct. 12, 2022, 1:18 a.m. | Charith Peris, Lizhen Tan, Thomas Gueudre, Turan Gojayev, Pan Wei, Gokmen Oz
cs.CL updates on arXiv.org arxiv.org
Teacher-student knowledge distillation is a popular technique for compressing
today's prevailing large language models into manageable sizes that fit
low-latency downstream applications. Both the teacher and the choice of
transfer set used for distillation are crucial ingredients in creating a high
quality student. Yet, the generic corpora used to pretrain the teacher and the
corpora associated with the downstream target domain are often significantly
different, which raises a natural question: should the student be distilled
over the generic corpora, so …
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 13 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
RL Analytics - Content, Data Science Manager
@ Meta | Burlingame, CA
Research Engineer
@ BASF | Houston, TX, US, 77079