[R] Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | allainews.com

Dec. 24, 2023, 5:33 p.m. | /u/APaperADay

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2312.06585](https://arxiv.org/abs/2312.06585)

**Abstract**:

>Fine-tuning language models\~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call **ReST*****^(EM)***, where we …

abstract beyond data diversity example explore feedback fine-tuning generated human language language models machinelearning math paper performance practice quality tasks verify

More from www.reddit.com / Machine Learning

[P] Google Colab crashes before even training my images dataset. 14 hours ago | www.reddit.com

binary class classification colab +16

[D] Is Evaluating LLM Performance on Domain-Specific QA Sufficient for a Top-Tier Conference Submission? 15 hours ago | www.reddit.com

conference domain five hello +9

[N] Book Lauching: Accelerate Model Training with PyTorch 2.X 15 hours ago | www.reddit.com

ai workloads analyst book boosting +12

[D] Best community/website to find ML engineer interested in hourly work 18 hours ago | www.reddit.com

apis building community custom models +15

[D] What on earth is "discretization" step in Mamba? 20 hours ago | www.reddit.com

article core earth form +11

[R] Better & Faster Large Language Models via Multi-token Prediction 21 hours ago | www.reddit.com

abstract efficiency future gpt +17

[D] How to use RAG benchmarks in practice 1 day, 1 hour ago | www.reddit.com

context datasets however machinelearning +5

[D] ECCV-2024 reviews are out 1 day, 10 hours ago | www.reddit.com

eccv machinelearning reviews

[D] ICLR Outstanding Paper Awards. Congratulations! 1 day, 12 hours ago | www.reddit.com

abstract feature identify images +12

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net