Jan. 24, 2022, 1:43 p.m. | Remi Ouazan Reboul

Towards Data Science - Medium towardsdatascience.com

This article is part 2 of a two-part series on distilling BERT-like models in the fashion of DistilBERT. For part one, you may follow this link. If however, you consider that you have a good grasp on DistilBERT’s distillation method, feel free to skip the read.

Much like chemical distillation, we’re going to extract from our model what matters: knowledge. Photo by Elevate on Unsplash

Recap

In case you haven’t noticed, machine learning models have been getting larger and …

bert code data science distillation machine learning nlp transfer learning

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

IT Data Engineer

@ Procter & Gamble | BUCHAREST OFFICE

Data Engineer (w/m/d)

@ IONOS | Deutschland - Remote

Staff Data Science Engineer, SMAI

@ Micron Technology | Hyderabad - Phoenix Aquila, India

Academically & Intellectually Gifted Teacher (AIG - Elementary)

@ Wake County Public School System | Cary, NC, United States