[R] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | allainews.com

May 4, 2023, 10 p.m. | /u/Dapper_Cherry1025

Machine Learning www.reddit.com

paper: [\[2305.02301\] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes (arxiv.org)](https://arxiv.org/abs/2305.02301)

Abstract:

> Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains …

abstract applications compute data distillation finetuning generated human labels language language models large language models llm llms machinelearning memory performance practical researchers training training data

More from www.reddit.com / Machine Learning

[D] tutorial on how to build streaming ML applications 10 hours ago | www.reddit.com

machinelearning

[D] Why is R^2 so crazy? 11 hours ago | www.reddit.com

baseball games good labels +5

[D] Preserving spatial distribution of data during data splitting 15 hours ago | www.reddit.com

data dataset distribution machinelearning +6

[N] Snowflake releases open (Apache 2.0) 128x3B MoE model 16 hours ago | www.reddit.com

apache apache 2.0 machinelearning moe +2

[D] Why would such a simple sentence break an LLM? 16 hours ago | www.reddit.com

copilot disadvantages german gpt4 +7

[R] Speaker diarization 17 hours ago | www.reddit.com

api assemblyai aws box +12

[R] I made an app to predict ICML paper acceptance from reviews 20 hours ago | www.reddit.com

analysis conferences iclr machinelearning +6

[R] SpaceByte: Towards Deleting Tokenization from Large Language Modeling - Rice University 2024 - Practically … 21 hours ago | www.reddit.com

abstract machinelearning

[D] Keeping track of models and their associated metadata. 22 hours ago | www.reddit.com

industry machinelearning metadata project +1

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Alternant Data Engineering

@ Aspire Software | Angers, FR

View on ai-jobs.net

Senior Software Engineer, Generative AI

@ Google | Dublin, Ireland

View on ai-jobs.net