March 15, 2024, 8:42 p.m. | /u/FallMindless3563

Machine Learning www.reddit.com

Hey all,

After reading the Self-Rewarding Language Models paper by the team at Meta, it felt very approachable and reproducible, so we spent some time implementing it.



The scripts provided take any base model and put it in a loop of :

1) Supervised fine-tuning on an initial dataset

2) Generating new prompts using the SFT

3) Generating N responses per prompt

4) Scoring the generated responses 1-5

5) Running DPO on the rewards from the model itself.

​ …

dataset felt fine-tuning hey language language models loop machinelearning meta paper reading scripts supervised fine-tuning team

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Manager

@ Sanofi | Budapest

Principal Engineer, Data (Hybrid)

@ Homebase | Toronto, Ontario, Canada