Web: http://arxiv.org/abs/2201.08539

Jan. 24, 2022, 2:10 a.m. | Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang

cs.LG updates on arXiv.org arxiv.org

Recently, large pre-trained models have significantly improved the
performance of various Natural LanguageProcessing (NLP) tasks but they are
expensive to serve due to long serving latency and large memory usage. To
compress these models, knowledge distillation has attracted an increasing
amount of interest as one of the most effective methods for model compression.
However, existing distillation methods have not yet addressed the unique
challenges of model serving in datacenters, such as handling fast evolving
models, considering serving performance, and optimizing …

arxiv framework hardware language language models models

More from arxiv.org / cs.LG updates on arXiv.org

Data Analytics and Technical support Lead

@ Coupa Software, Inc. | Bogota, Colombia

Data Science Manager

@ Vectra | San Jose, CA

Data Analyst Sr

@ Capco | Brazil - Sao Paulo

Data Scientist (NLP)

@ Builder.ai | London, England, United Kingdom - Remote

Senior Data Analyst

@ BuildZoom | Scottsdale, AZ/ San Francisco, CA/ Remote

Senior Research Scientist, Speech Recognition

@ SoundHound Inc. | Toronto, Canada