Less is More: Task-aware Layer-wise Distillation for Language Model Compression. (arXiv:2210.01351v2 [cs.CL] UPDATED) | allainews.com

Oct. 6, 2022, 1:13 a.m. | Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

cs.LG updates on arXiv.org arxiv.org

Layer-wise distillation is a powerful tool to compress large models (i.e.
teacher models) into small ones (i.e., student models). The student distills
knowledge from the teacher by mimicking the hidden representations of the
teacher at every intermediate layer. However, layer-wise distillation is
difficult. Since the student has a smaller model capacity than the teacher, it
is often under-fitted. Furthermore, the hidden representations of the teacher
contain redundant information that the student does not necessarily need for
the target task's learning. …

arxiv compression distillation language language model

More from arxiv.org / cs.LG updates on arXiv.org

Learning to Manipulate under Limited Information 1 day, 7 hours ago | arxiv.org

abstract arxiv become cs.ai +13

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction … 1 day, 7 hours ago | arxiv.org

abstract alignment arxiv cs.ai +17

Evolutionary Optimization of 1D-CNN for Non-contact Respiration Pattern Classification 1 day, 7 hours ago | arxiv.org

abstract arxiv classification cnn +17

Regularization by Texts for Latent Diffusion Inverse Solvers 1 day, 7 hours ago | arxiv.org

abstract arxiv challenges cs.ai +10

A Systematic Review of Aspect-based Sentiment Analysis (ABSA): Domains, Methods, and Trends 1 day, 7 hours ago | arxiv.org

abstract analysis arxiv cs.cl +13

Fossil 2.0: Formal Certificate Synthesis for the Verification and Control of Dynamical Models 1 day, 7 hours ago | arxiv.org

abstract arxiv control cs.lg +16

In-Context Learning Dynamics with Random Binary Sequences 1 day, 7 hours ago | arxiv.org

abstract art arxiv binary +24

Sharp error bounds for imbalanced classification: how many examples in the minority class? 1 day, 7 hours ago | arxiv.org

abstract arxiv class classification +15

When can transformers reason with abstract symbols? 1 day, 7 hours ago | arxiv.org

abstract arxiv capabilities cs.ai +19

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

View on ai-jobs.net

Data Engineer

@ Paxos | Remote - United States

View on ai-jobs.net

Data Analytics Specialist

@ Media.Monks | Kuala Lumpur

View on ai-jobs.net

Software Engineer III- Pyspark

@ JPMorgan Chase & Co. | India

View on ai-jobs.net

Engineering Manager, Data Infrastructure

@ Dropbox | Remote - Canada

View on ai-jobs.net

Senior AI NLP Engineer

@ Hyro | Tel Aviv-Yafo, Tel Aviv District, Israel

View on ai-jobs.net