May 27, 2023, 1:25 p.m. | /u/kkimdev

Machine Learning www.reddit.com

There has been a lot of distillation research & application on BERT and its variants. I was wondering why we don't see much distillation research on GPT-3 size level LLMs?

Can anyone familiar with LLM distillation share some insights? Thanks in advance!

advance application bert distillation gpt gpt-3 insights llm llms machinelearning research sota variants

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Enterprise AI Architect

@ Oracle | Broomfield, CO, United States

Cloud Data Engineer France H/F (CDI - Confirmé)

@ Talan | Nantes, France