Less is More: Task-aware Layer-wise Distillation for Language Model Compression. (arXiv:2210.01351v2 [cs.CL] UPDATED) | allainews.com

Oct. 6, 2022, 1:16 a.m. | Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

cs.CL updates on arXiv.org arxiv.org

Layer-wise distillation is a powerful tool to compress large models (i.e.
teacher models) into small ones (i.e., student models). The student distills
knowledge from the teacher by mimicking the hidden representations of the
teacher at every intermediate layer. However, layer-wise distillation is
difficult. Since the student has a smaller model capacity than the teacher, it
is often under-fitted. Furthermore, the hidden representations of the teacher
contain redundant information that the student does not necessarily need for
the target task's learning. …

arxiv compression distillation language language model

More from arxiv.org / cs.CL updates on arXiv.org

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition 15 hours ago | arxiv.org

abstract artificial artificial general intelligence arxiv +19

Visually grounded few-shot word learning in low-resource settings 15 hours ago | arxiv.org

abstract arxiv cs.cl eess.as +16

KTRL+F: Knowledge-Augmented In-Document Search 15 hours ago | arxiv.org

abstract arxiv challenges cs.cl +12

Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering 15 hours ago | arxiv.org

abstract alignment applications arxiv +19

Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks 15 hours ago | arxiv.org

abstract arxiv context cs.cl +17

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models 15 hours ago | arxiv.org

arxiv cs.cl dataset framework +9

Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models 15 hours ago | arxiv.org

abstract accuracy analysis arxiv +18

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement 15 hours ago | arxiv.org

arxiv cs.ai cs.cl language +6

MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition 15 hours ago | arxiv.org

abstract arxiv characters chinese +10

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

IT Commercial Data Analyst - ESO

@ National Grid | Warwick, GB, CV34 6DA

View on ai-jobs.net

Stagiaire Data Analyst – Banque Privée - Juillet 2024

@ Rothschild & Co | Paris (Messine-29)

View on ai-jobs.net

Operations Research Scientist I - Network Optimization Focus

@ CSX | Jacksonville, FL, United States

View on ai-jobs.net

Machine Learning Operations Engineer

@ Intellectsoft | Baku, Baku, Azerbaijan - Remote

View on ai-jobs.net

Data Analyst

@ Health Care Service Corporation | Richardson Texas HQ (1001 E. Lookout Drive)

View on ai-jobs.net