May 4, 2023, 10 p.m. | /u/Dapper_Cherry1025

Machine Learning www.reddit.com

paper: [\[2305.02301\] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes (arxiv.org)](https://arxiv.org/abs/2305.02301)

Abstract:

> Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains …

abstract applications compute data distillation finetuning generated human labels language language models large language models llm llms machinelearning memory performance practical researchers training training data

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Alternant Data Engineering

@ Aspire Software | Angers, FR

Senior Software Engineer, Generative AI

@ Google | Dublin, Ireland