April 19, 2024, 12:11 a.m. | /u/miladink

Machine Learning www.reddit.com

So, nowadays, everyone is distilling rationales gathered from a large language model to another relatively smaller model. However, I remember from the old days that we did we train the small network to match the logits of the large network when doing distillation. Is this forgotten /tried and did not work today?

distillation however language language model language models large language large language model large language models machinelearning match network small train

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru