[P] Reproducing the "Self-Rewarding Language Models" Paper by MetaAI | allainews.com

March 15, 2024, 8:42 p.m. | /u/FallMindless3563

Machine Learning www.reddit.com

Hey all,

After reading the Self-Rewarding Language Models paper by the team at Meta, it felt very approachable and reproducible, so we spent some time implementing it.

The scripts provided take any base model and put it in a loop of :

1) Supervised fine-tuning on an initial dataset

2) Generating new prompts using the SFT

3) Generating N responses per prompt

4) Scoring the generated responses 1-5

5) Running DPO on the rewards from the model itself.

…

dataset felt fine-tuning hey language language models loop machinelearning meta paper reading scripts supervised fine-tuning team

More from www.reddit.com / Machine Learning

[D] Evaluating LLMs Long-Context performance: What are the best practices? 4 hours ago | www.reddit.com

benchmarks best practices context frameworks +8

[R] Measuring Vision-Language STEM Skills of Neural Models 5 hours ago | www.reddit.com

abstract authors challenge engineering +16

[R] NExT: Teaching Large Language Models to Reason about Code Execution 8 hours ago | www.reddit.com

abstract code debug debugging +20

How much coursework is required to land an entry-level ML job? [D] 10 hours ago | www.reddit.com

berkeley building epidemiology job +4

[D] Foundational papers for Graph Adversarial Learning? 11 hours ago | www.reddit.com

machinelearning papers understanding

[D] Suggestions for NLP Papers Commonly Implemented in ML Interviews 22 hours ago | www.reddit.com

companies implementation interview interviews +10

[D] How can attention mechanisms retrieve meaningful information over long distances when using RoPE or … 1 day, 1 hour ago | www.reddit.com

attention attention mechanisms information machinelearning +3

[D] Do Lead's in an AI/DS/ML team always have PhDs, is it a requirement? 1 day, 2 hours ago | www.reddit.com

hello lecture machinelearning masters +3

[D] Correct me if I'm wrong, use KL divergence for NLP, and MMD for CV. … 1 day, 6 hours ago | www.reddit.com

distribution divergence fields found +5

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Intelligence Manager

@ Sanofi | Budapest

View on ai-jobs.net

Principal Engineer, Data (Hybrid)

@ Homebase | Toronto, Ontario, Canada

View on ai-jobs.net