all AI news
[R] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
May 4, 2023, 10 p.m. | /u/Dapper_Cherry1025
Machine Learning www.reddit.com
Abstract:
> Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains …
abstract applications compute data distillation finetuning generated human labels language language models large language models llm llms machinelearning memory performance practical researchers training training data
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Alternant Data Engineering
@ Aspire Software | Angers, FR
Senior Software Engineer, Generative AI
@ Google | Dublin, Ireland